[WebCrawl] adding affiliation relations from web information #428

Merged
claudio.atzori merged 3 commits from WebCrowlBeta into beta 2024-04-23 09:36:16 +02:00

This PR adds affiliation information from web resources. It takes everythong for IE and up to 2021 for the other countries

This PR adds affiliation information from web resources. It takes everythong for IE and up to 2021 for the other countries
claudio.atzori was assigned by miriam.baglioni 2024-04-22 11:06:44 +02:00
miriam.baglioni added 1 commit 2024-04-22 11:06:45 +02:00
claudio.atzori added 1 commit 2024-04-22 11:40:25 +02:00
claudio.atzori requested changes 2024-04-22 12:22:16 +02:00
claudio.atzori left a comment
Owner

Please address the inline comments

Please address the inline comments
@ -0,0 +78,4 @@
spark -> {
createActionSet(spark, inputPath, outputPath + "actionSet");
createPlainRelations(spark, inputPath, outputPath + "relations");

The output path is controlled by the actionset management framework and should not be altered by the workflow. Hence consider to place them elsewhere under a different root directory. Also the output of the createActionSet function should be placed directly inside outputPath.

The output path is controlled by the actionset management framework and should not be altered by the workflow. Hence consider to place them elsewhere under a different root directory. Also the output of the `createActionSet` function should be placed directly inside `outputPath`.
@ -0,0 +197,4 @@
return createAffiliatioRelationPair(
PMCID_PREFIX
+ IdentifierFactory
.md5(PidCleaner.normalizePidValue(PidType.pmc.toString(), "PMC" + pmcid.substring(43))),

It seems that PIDs found in the input dataset include their resolver. As we need them without it, it should be better to make this manipulation explicit.

It seems that PIDs found in the input dataset include their resolver. As we need them without it, it should be better to make this manipulation explicit.
miriam.baglioni added 1 commit 2024-04-22 13:52:55 +02:00
claudio.atzori merged commit c57cff2d6d into beta 2024-04-23 09:36:16 +02:00
Sign in to join this conversation.
No description provided.