staging-public-wf #2

Manually merged
enrico.ottonello merged 10 commits from staging-public-wf into master 3 years ago
Owner

See also #1.

I have updated the repo-hi workflow and related templates as agreed in branch staging-public-wf. I have also re-organised workflow and templates in different directories. I am sorry for that, it complicates too much the diff, I should have done it on the master branch before.

Specifically, the aggregation&publish workflow now includes:

  • collection
  • transformation
  • indexing on Solr for the Content Checker
  • feeding on staging GraphDB
    The above run on cascade, one after the other. On GraphDB, the workflow stops, as we need to check the data and write the enrichment SPARQL queries.
    The following steps must therefore be run manually:
  • enrich staging GraphDB (+ manual check the enrichment succeeded)
  • publish on staging Elasticsearch

The last workflow is for public publishing:

  • publish on public GraphDB
  • enrich public GraphDB
  • publish on public Elasticsearch

In the test folder named wf_migration I have added instructions for the manual migration of existing aggregation workflows, so that we do not need to remove them and re-aggregate and publish from scratch.

I tried the new wf assignment on my local machine. Can you try it as well?
If there are no problems, please merge the pull request and once we have it running I can help you migrating the existing wfs.

See also #1. I have updated the repo-hi workflow and related templates as agreed in branch staging-public-wf. I have also re-organised workflow and templates in different directories. I am sorry for that, it complicates too much the diff, I should have done it on the master branch before. Specifically, the aggregation&publish workflow now includes: * collection * transformation * indexing on Solr for the Content Checker * feeding on staging GraphDB The above run on cascade, one after the other. On GraphDB, the workflow stops, as we need to check the data and write the enrichment SPARQL queries. The following steps must therefore be run manually: * enrich staging GraphDB (+ manual check the enrichment succeeded) * publish on staging Elasticsearch The last workflow is for public publishing: * publish on public GraphDB * enrich public GraphDB * publish on public Elasticsearch In the test folder named wf_migration I have added instructions for the manual migration of existing aggregation workflows, so that we do not need to remove them and re-aggregate and publish from scratch. I tried the new wf assignment on my local machine. Can you try it as well? If there are no problems, please merge the pull request and once we have it running I can help you migrating the existing wfs.
enrico.ottonello was assigned by alessia.bardi 3 years ago
Collaborator

On my local machine I created a new api and assigned the new workflow "Aggregate, enrich and index AriadnePlus content" to it; then I launched all the workflow and I got this error:

2020-12-15 15:55:00,500 [http-bio-8280-exec-7] ERROR eu.dnetlib.msro.workflows.procs.WorkflowExecutor- Error parsing workflow: 1b7e9311-3408-4530-8aae-10fd8e3879d9_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl
eu.dnetlib.rmi.manager.MSROException: Missing or invalid nodes in arcs: [workflowDONE]
at eu.dnetlib.msro.workflows.graph.GraphLoader.checkValidity(GraphLoader.java:158)
at eu.dnetlib.msro.workflows.graph.GraphLoader.loadGraph(GraphLoader.java:69)
at eu.dnetlib.msro.workflows.procs.WorkflowExecutor.startWorkflow(WorkflowExecutor.java:129)

this error seems to be related to
3e28f8fd57/dnet-ariadneplus/src/main/resources/eu/dnetlib/ariadneplus/workflows/repo-hi/full_aggregation_wf.xml.st (L120)

where the workflow should be stopped after "feeding on staging GraphDB" step.

On my local machine I created a new api and assigned the new workflow "Aggregate, enrich and index AriadnePlus content" to it; then I launched all the workflow and I got this error: 2020-12-15 15:55:00,500 [http-bio-8280-exec-7] ERROR eu.dnetlib.msro.workflows.procs.WorkflowExecutor- Error parsing workflow: 1b7e9311-3408-4530-8aae-10fd8e3879d9_V29ya2Zsb3dEU1Jlc291cmNlcy9Xb3JrZmxvd0RTUmVzb3VyY2VUeXBl eu.dnetlib.rmi.manager.MSROException: Missing or invalid nodes in arcs: [workflowDONE] at eu.dnetlib.msro.workflows.graph.GraphLoader.checkValidity(GraphLoader.java:158) at eu.dnetlib.msro.workflows.graph.GraphLoader.loadGraph(GraphLoader.java:69) at eu.dnetlib.msro.workflows.procs.WorkflowExecutor.startWorkflow(WorkflowExecutor.java:129) this error seems to be related to https://code-repo.d4science.org/D-Net/AriadnePlus/src/commit/3e28f8fd57c2cf523a8544a2669c4f1d6e629778/dnet-ariadneplus/src/main/resources/eu/dnetlib/ariadneplus/workflows/repo-hi/full_aggregation_wf.xml.st#L120 where the workflow should be stopped after "feeding on staging GraphDB" step.
Poster
Owner

I added the missing wf node and pushed. Try again please.

I added the missing wf node and pushed. Try again please.
Collaborator

Executing publishToStagingGraphDB node, the workflow fails:

2020-12-15 18:04:53,983 [pool-1-thread-4] ERROR eu.dnetlib.msro.workflows.procs.WorkflowExecutor- Error starting workflow template: 7426eaaf-93c9-4914-b69a-c9d5c478405a_V29ya2Zsb3dUZW1wbGF0ZURTUmVzb3VyY2VzL1dvcmtmbG93VGVtcGxhdGVEU1Jlc291cmNlVHlwZQ==
eu.dnetlib.rmi.manager.MSROException: A required parameter is missing in wf template:stagingPublisherEndpoint
at eu.dnetlib.msro.workflows.procs.WorkflowExecutor.startWorkflowTemplate(WorkflowExecutor.java:171)

Parameter name should be publisherEndpoint instead of stagingPublisherEndpoint here:

8a9782cb1e/dnet-ariadneplus/src/main/resources/eu/dnetlib/bootstrap/profiles/workflows/templates/graphdb_template.xml (L16)

Executing publishToStagingGraphDB node, the workflow fails: 2020-12-15 18:04:53,983 [pool-1-thread-4] ERROR eu.dnetlib.msro.workflows.procs.WorkflowExecutor- Error starting workflow template: 7426eaaf-93c9-4914-b69a-c9d5c478405a_V29ya2Zsb3dUZW1wbGF0ZURTUmVzb3VyY2VzL1dvcmtmbG93VGVtcGxhdGVEU1Jlc291cmNlVHlwZQ== eu.dnetlib.rmi.manager.MSROException: A required parameter is missing in wf template:stagingPublisherEndpoint at eu.dnetlib.msro.workflows.procs.WorkflowExecutor.startWorkflowTemplate(WorkflowExecutor.java:171) Parameter name should be `publisherEndpoint` instead of `stagingPublisherEndpoint` here: https://code-repo.d4science.org/D-Net/AriadnePlus/src/commit/8a9782cb1efd1b8e6c59c5ff86f9deeec83e1663/dnet-ariadneplus/src/main/resources/eu/dnetlib/bootstrap/profiles/workflows/templates/graphdb_template.xml#L16
Poster
Owner

Of course you are right. Fixed.

Of course you are right. Fixed.
Collaborator

Executing publishToPublic node, the workflow fails:

2020-12-17 12:17:04,356 [pool-80-thread-1] INFO eu.dnetlib.msro.workflows.procs.ProcessEngine- Starting workflow: [process id='wf_20201217_1217
00_125' name='publishToPublic']
2020-12-17 12:17:04,359 [pool-80-thread-1] ERROR eu.dnetlib.msro.workflows.procs.WorkflowExecutor- Error starting workflow template: 7426eaaf-93
c9-4914-b69a-c9d5c478405a_V29ya2Zsb3dUZW1wbGF0ZURTUmVzb3VyY2VzL1dvcmtmbG93VGVtcGxhdGVEU1Jlc291cmNlVHlwZQ==
eu.dnetlib.rmi.manager.MSROException: A required parameter is missing in wf template:cleanMdstoreId

because of here

e5eefef411/dnet-ariadneplus/src/main/resources/eu/dnetlib/bootstrap/profiles/workflows/templates/public_publishing_template.xml (L16)

you should add a parameter:

<PARAM description="Store for transformed records" name="cleanMdstoreId" required="true" type="string"/>

requested by graphdb_template.xml

Executing publishToPublic node, the workflow fails: 2020-12-17 12:17:04,356 [pool-80-thread-1] INFO eu.dnetlib.msro.workflows.procs.ProcessEngine- Starting workflow: [process id='wf_20201217_1217 00_125' name='publishToPublic'] 2020-12-17 12:17:04,359 [pool-80-thread-1] ERROR eu.dnetlib.msro.workflows.procs.WorkflowExecutor- Error starting workflow template: 7426eaaf-93 c9-4914-b69a-c9d5c478405a_V29ya2Zsb3dUZW1wbGF0ZURTUmVzb3VyY2VzL1dvcmtmbG93VGVtcGxhdGVEU1Jlc291cmNlVHlwZQ== eu.dnetlib.rmi.manager.MSROException: A required parameter is missing in wf template:cleanMdstoreId because of here https://code-repo.d4science.org/D-Net/AriadnePlus/src/commit/e5eefef4113b31ebd4efa2a1113cb268983aba7f/dnet-ariadneplus/src/main/resources/eu/dnetlib/bootstrap/profiles/workflows/templates/public_publishing_template.xml#L16 you should add a parameter: `<PARAM description="Store for transformed records" name="cleanMdstoreId" required="true" type="string"/>` requested by graphdb_template.xml
Poster
Owner

We should have reached the end of missing nodes and params now...
Sorry for that, I had no time to set up a working dev environment to check these details by myself. Try again while I am crossing my fingers!

We should have reached the end of missing nodes and params now... Sorry for that, I had no time to set up a working dev environment to check these details by myself. Try again while I am crossing my fingers!
Collaborator

Last issue about params:

here:

6c150be5b2/dnet-ariadneplus/src/main/resources/eu/dnetlib/bootstrap/profiles/workflows/templates/public_publishing_template.xml

all the 3 sections:

<ENTRY key="dsId"               value="$dsId$" />
<ENTRY key="dsName"             value="$dsName$" />
<ENTRY key="interface"          value="$interface$" />

have to become like this:

<ENTRY key="dsId" ref="dsId"/>
<ENTRY key="dsName" ref="dsName"/>
<ENTRY key="interface" ref="interface"/>

With this updates the workflow runs successfully.

Last issue about params: here: https://code-repo.d4science.org/D-Net/AriadnePlus/src/commit/6c150be5b22e7314499b38d4d0a17bdb1b17c712/dnet-ariadneplus/src/main/resources/eu/dnetlib/bootstrap/profiles/workflows/templates/public_publishing_template.xml all the 3 sections: ``` <ENTRY key="dsId" value="$dsId$" /> <ENTRY key="dsName" value="$dsName$" /> <ENTRY key="interface" value="$interface$" /> ``` have to become like this: ``` <ENTRY key="dsId" ref="dsId"/> <ENTRY key="dsName" ref="dsName"/> <ENTRY key="interface" ref="interface"/> ``` With this updates the workflow runs successfully.
Poster
Owner

There you go!

There you go!
Collaborator

The last thing is that Last execution date value of the workflow is updated only if a node fails, not on success of the main Launch button.

The last thing is that *Last execution date* value of the workflow is updated only if a node fails, not on success of the main Launch button.
Collaborator
Other 3 rows to modify: https://code-repo.d4science.org/D-Net/AriadnePlus/src/commit/0e30eccbaebc4b02addb5a25941403a5a5f829e3/dnet-ariadneplus/src/main/resources/eu/dnetlib/bootstrap/profiles/workflows/templates/public_publishing_template.xml#L62
Poster
Owner

The last thing is that Last execution date value of the workflow is updated only if a node fails, not on success of the main Launch button.

I do not think this was expected but I would like to have @miriam.baglioni 's confirmation that in EFG this does not happen

I have updated the PR for the last three rows.

> The last thing is that Last execution date value of the workflow is updated only if a node fails, not on success of the main Launch button. I do not think this was expected but I would like to have @miriam.baglioni 's confirmation that in EFG this does not happen I have updated the PR for the last three rows.
Collaborator

No, it does not happen in EFG

No, it does not happen in EFG
enrico.ottonello changed title from WIP: staging-public-wf to staging-public-wf 3 years ago
enrico.ottonello closed this pull request 3 years ago
alessia.bardi deleted branch staging-public-wf 3 years ago
The pull request has been manually merged as 0d6e69e3eb.
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b staging-public-wf master
git pull origin staging-public-wf

Step 2:

Merge the changes and update on Gitea.
git checkout master
git merge --no-ff staging-public-wf
git push origin master
Sign in to join this conversation.
No reviewers
No Label
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/AriadnePlus#2
Loading…
There is no content yet.