Add Pubmed affiliations (inferred by BIP) as actionsets #353
No reviewers
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#353
Loading…
Reference in New Issue
No description provided.
Delete Branch "9117_pubmed_affiliations"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Related to task 9117
Extends the WF that creates actionsets for Crossref by also adding affiliation relations from Pubmed.
New workflow param:
pubmedInputPath
-> we can assume that this param can be set to/data/bip-affiiations/pubmed-data.json
and with each update, we will update the file in this path.Overall looks good, please address the minor changes I mentioned with the inline comments.
@ -59,3 +60,3 @@
final String inputPath = parser.get("inputPath");
log.info("inputPath {}: ", inputPath);
log.info("inputPath: {}", inputPath);
As the workflow now assumes now two input paths, I suggest to align the parameter naming convention renaming this one to
crossrefInputPath
.@ -8,7 +8,13 @@
{
"paramName": "ip",
"paramLongName": "inputPath",
As the workflow now assumes two input paths, I suggest to align the parameter naming convention renaming this one to
crossrefInputPath
.@ -32,4 +32,5 @@ spark2SqlQueryExecutionListeners=com.cloudera.spark.lineage.NavigatorQueryListen
oozie.wf.application.path=${oozieTopWfApplicationPath}
inputPath=/data/bip-affiliations/data.json
parameter renaming to be reflected here as well
@ -3,7 +3,11 @@
<property>
<name>inputPath</name>
again, parameter renaming to be reflected here as well
@ -97,6 +101,7 @@
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
</spark-opts>
<arg>--inputPath</arg><arg>${inputPath}</arg>
again, parameter renaming to be reflected here as well
@claudio.atzori Thanks for the input! My intention was to keep the naming of the crossref input path in order not to change the different config files.
But, you are right, it is better to have consistent naming across all input params.
Please check again, thanks!
No prob! The rest looks OK, I'm going to keep this pending for integration until I will be ready to run again the pipeline to build beta graph.
@claudio.atzori can we merge this for the next prod round? It seems to be important to get as many affiliations as possible for the Irish tender.
The changes are trivial, so I don't see major impediments in doing so. However, to keep things separated among beta and prod, can you propose the same PR against the master branch?
The epilogue of the ongoing beta graph update is still uncertain, hence I am not sure if I will be able to bring all the most recent changes included in the beta branch at this round to production.
PR for master branch is available here: #357
@claudio.atzori I think, we can also merge this, to be inplace for the next beta round.