Add Pubmed affiliations (inferred by BIP) as actionsets #353

Merged
claudio.atzori merged 5 commits from 9117_pubmed_affiliations into beta 2023-11-22 13:53:07 +01:00
Member

Related to task 9117

Extends the WF that creates actionsets for Crossref by also adding affiliation relations from Pubmed.

New workflow param:

  • pubmedInputPath -> we can assume that this param can be set to /data/bip-affiiations/pubmed-data.json and with each update, we will update the file in this path.
Related to task [9117](https://support.openaire.eu/issues/9117) Extends the WF that creates actionsets for Crossref by also adding affiliation relations from Pubmed. New workflow param: * `pubmedInputPath` -> we can assume that this param can be set to `/data/bip-affiiations/pubmed-data.json` and with each update, we will update the file in this path.
claudio.atzori was assigned by schatz 2023-10-20 11:51:34 +02:00
schatz added 2 commits 2023-10-20 11:51:35 +02:00
claudio.atzori requested changes 2023-10-25 14:40:31 +02:00
claudio.atzori left a comment
Owner

Overall looks good, please address the minor changes I mentioned with the inline comments.

Overall looks good, please address the minor changes I mentioned with the inline comments.
@ -59,3 +60,3 @@
final String inputPath = parser.get("inputPath");
log.info("inputPath {}: ", inputPath);
log.info("inputPath: {}", inputPath);

As the workflow now assumes now two input paths, I suggest to align the parameter naming convention renaming this one to crossrefInputPath.

As the workflow now assumes now two input paths, I suggest to align the parameter naming convention renaming this one to `crossrefInputPath`.
schatz marked this conversation as resolved
@ -8,7 +8,13 @@
{
"paramName": "ip",
"paramLongName": "inputPath",

As the workflow now assumes two input paths, I suggest to align the parameter naming convention renaming this one to crossrefInputPath.

As the workflow now assumes two input paths, I suggest to align the parameter naming convention renaming this one to `crossrefInputPath`.
schatz marked this conversation as resolved
@ -32,4 +32,5 @@ spark2SqlQueryExecutionListeners=com.cloudera.spark.lineage.NavigatorQueryListen
oozie.wf.application.path=${oozieTopWfApplicationPath}
inputPath=/data/bip-affiliations/data.json

parameter renaming to be reflected here as well

parameter renaming to be reflected here as well
schatz marked this conversation as resolved
@ -3,7 +3,11 @@
<property>
<name>inputPath</name>

again, parameter renaming to be reflected here as well

again, parameter renaming to be reflected here as well
schatz marked this conversation as resolved
@ -97,6 +101,7 @@
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
</spark-opts>
<arg>--inputPath</arg><arg>${inputPath}</arg>

again, parameter renaming to be reflected here as well

again, parameter renaming to be reflected here as well
schatz marked this conversation as resolved
schatz added 1 commit 2023-10-25 21:05:08 +02:00
Author
Member

@claudio.atzori Thanks for the input! My intention was to keep the naming of the crossref input path in order not to change the different config files.
But, you are right, it is better to have consistent naming across all input params.
Please check again, thanks!

@claudio.atzori Thanks for the input! My intention was to keep the naming of the crossref input path in order not to change the different config files. But, you are right, it is better to have consistent naming across all input params. Please check again, thanks!
claudio.atzori added this to the OpenAIRE project 2023-10-26 09:41:07 +02:00
claudio.atzori modified the project from OpenAIRE to OpenAIRE - DNet 2023-10-26 09:58:35 +02:00
schatz added 1 commit 2023-10-26 22:47:10 +02:00

@claudio.atzori Thanks for the input! My intention was to keep the naming of the crossref input path in order not to change the different config files.
But, you are right, it is better to have consistent naming across all input params.
Please check again, thanks!

No prob! The rest looks OK, I'm going to keep this pending for integration until I will be ready to run again the pipeline to build beta graph.

> @claudio.atzori Thanks for the input! My intention was to keep the naming of the crossref input path in order not to change the different config files. > But, you are right, it is better to have consistent naming across all input params. > Please check again, thanks! No prob! The rest looks OK, I'm going to keep this pending for integration until I will be ready to run again the pipeline to build beta graph.
Author
Member

@claudio.atzori can we merge this for the next prod round? It seems to be important to get as many affiliations as possible for the Irish tender.

@claudio.atzori can we merge this for the next prod round? It seems to be important to get as many affiliations as possible for the Irish tender.
claudio.atzori requested review from alessia.bardi 2023-10-30 15:50:46 +01:00
claudio.atzori approved these changes 2023-10-30 15:51:29 +01:00
claudio.atzori removed review request for alessia.bardi 2023-10-30 15:51:50 +01:00

@claudio.atzori can we merge this for the next prod round? It seems to be important to get as many affiliations as possible for the Irish tender.

The changes are trivial, so I don't see major impediments in doing so. However, to keep things separated among beta and prod, can you propose the same PR against the master branch?

The epilogue of the ongoing beta graph update is still uncertain, hence I am not sure if I will be able to bring all the most recent changes included in the beta branch at this round to production.

> @claudio.atzori can we merge this for the next prod round? It seems to be important to get as many affiliations as possible for the Irish tender. The changes are trivial, so I don't see major impediments in doing so. However, to keep things separated among beta and prod, can you propose the same PR against the master branch? The epilogue of the ongoing beta graph update is still uncertain, hence I am not sure if I will be able to bring all the most recent changes included in the beta branch at this round to production.
Author
Member

PR for master branch is available here: #357

PR for master branch is available here: https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/357
Author
Member

@claudio.atzori I think, we can also merge this, to be inplace for the next beta round.

@claudio.atzori I think, we can also merge this, to be inplace for the next beta round.
claudio.atzori added 1 commit 2023-11-22 13:52:54 +01:00
claudio.atzori merged commit 836d7ec724 into beta 2023-11-22 13:53:07 +01:00
claudio.atzori deleted branch 9117_pubmed_affiliations 2023-11-22 13:53:07 +01:00
Sign in to join this conversation.
No description provided.