Commit Graph

2952 Commits

Author SHA1 Message Date
miconis eaaefb8b4c implementation of the procedure to reuse content of different dbs when creating the raw graph 2021-04-06 14:35:51 +02:00
miconis c39c82dfe9 modification of the jobs for the integration of openorgs in the provision, dedup records are no more created by merging but simply taking results of openorgs portal 2021-04-06 14:31:00 +02:00
Claudio Atzori 37b65cc3ad Merge pull request 'updates on stats-update workflow' (#100) from antonis.lempesis/dnet-hadoop:master into master
The workflow integrated in the _stable_ids_ branch has been run correctly on the BETA content, thus IMO this PR can be integrated in the master branch.

Reviewed-on: D-Net/dnet-hadoop#100
2021-04-02 16:13:35 +02:00
Claudio Atzori 1e7e5180fa [Graph model] updated definition of ExternalReference: added alternateLabel, removed description (#6503) 2021-04-02 12:32:12 +02:00
Claudio Atzori e686b8de8d [ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98 2021-04-01 17:11:03 +02:00
Claudio Atzori ee34cc51c3 [ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98 2021-04-01 17:07:49 +02:00
Claudio Atzori 70e49ed53c [OpenOrgsWf] trivial refactoring 2021-04-01 10:30:51 +02:00
Claudio Atzori 7941d7be29 WIP: using common definitions from ModelConstants 2021-03-31 18:33:57 +02:00
Claudio Atzori 879e8cc7ef WIP: using common definitions from ModelConstants 2021-03-31 17:12:01 +02:00
Claudio Atzori 72ce741ea6 WIP: using common definitions from ModelConstants 2021-03-31 17:07:13 +02:00
Enrico Ottonello 59ec5137e1 improvement related to https://issue.openaire.research-infrastructures.eu/issues/6501 2021-03-31 16:25:41 +02:00
Sandro La Bruzzo 616d2ecce2 splitted workflow collecting datacite into two workflows.
Released on beta
2021-03-31 15:45:58 +02:00
Miriam Baglioni 4b6e514f02 merge upstream 2021-03-30 10:27:12 +02:00
Claudio Atzori 27681b876c code formatting 2021-03-29 17:47:11 +02:00
Claudio Atzori 9237d55d7f [OpenOrgsWf] cleanup 2021-03-29 17:40:34 +02:00
Claudio Atzori 7f4e9479ec [OpenOrgsWf] graph construction wf: allow to skip the import openorgs node (importOpenorgs true|false) 2021-03-29 16:59:16 +02:00
Claudio Atzori 940556f6d3 Merge pull request 'OpenOrgs dedup and Integration with OpenAIRE Provision' (#102) from openorgswf into stable_ids
Reviewed-on: D-Net/dnet-hadoop#102
2021-03-29 16:41:09 +02:00
miconis 2709d08fc2 Merge branch 'stable_ids' into openorgswf 2021-03-29 16:39:07 +02:00
miconis f446580e9f code refactoring (useless classes and wf removed), implementation of the test for the openorgs dedup 2021-03-29 16:10:46 +02:00
Claudio Atzori 3becaa5539 [Cleaning] drop alternate identifiers with empty values 2021-03-29 16:01:35 +02:00
Claudio Atzori a0837ac357 [Stats update] integrating PR#100 for testing D-Net/dnet-hadoop#100 2021-03-29 15:59:58 +02:00
Claudio Atzori 48f2b6127e [Cleaning] drop alternate identifiers with empty values 2021-03-29 14:23:18 +02:00
miconis 2355cc4e9b minor changes and bug fix 2021-03-29 10:07:12 +02:00
Sandro La Bruzzo 1dfda3624e improved workflow importing datacite 2021-03-26 13:56:29 +01:00
Claudio Atzori b5b7dc2104 [Cleaning] drop alternate identifiers with empty values 2021-03-26 12:30:00 +01:00
Enrico Ottonello 91d8660982 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2021-03-25 11:21:20 +01:00
Enrico Ottonello ebd67b8c8f removed duplicates orcid data on authors set 2021-03-25 11:20:52 +01:00
Claudio Atzori 827e7e37db [Cleaning] drop instance.alternateIdentifier elements when they are available among instance.pid 2021-03-25 11:07:59 +01:00
miconis 28c1cdd132 merged stable_ids into openorgswf 2021-03-25 10:44:49 +01:00
miconis 5dfb66b0fa minor changes 2021-03-25 10:29:34 +01:00
miconis 348b0ef921 bug fix, implementation of the workflow for the creation of raw_organizations (openorgs dedup), addition of the pid lists to the openorgs postgres db 2021-03-24 15:51:27 +01:00
Claudio Atzori 751125fdf9 [Actionmanager] zero function considers empty entity.id as well as rel.source/rel.target 2021-03-23 17:34:32 +01:00
Claudio Atzori 1e423fdc07 [Actionmanager] remove invalid records from the input graph before groupGraphTableByIdAndMerge 2021-03-23 13:39:24 +01:00
Claudio Atzori e5ebb500cf fixed pom versions; included missing workflow modules in dhp-workflows/pom.xml 2021-03-23 12:13:53 +01:00
Claudio Atzori b75ad76f79 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-03-23 09:59:12 +01:00
Claudio Atzori 8db248aa13 avoiding error on jenkins compilations: java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)! 2021-03-23 09:56:34 +01:00
Sandro La Bruzzo 625e4c29c4 added model constants 2021-03-23 09:39:56 +01:00
Claudio Atzori b4febed138 updated mapping tests as consequence of the special treatment reserved to Handle PIDs 2021-03-23 09:37:48 +01:00
Claudio Atzori 431cbe9955 handle missing instance.pid during bulk cleaning 2021-03-23 09:28:58 +01:00
Sandro La Bruzzo c392936b97 fixed error on best access right 2021-03-23 09:23:22 +01:00
Sandro La Bruzzo c73072079d fix conflicts 2021-03-22 16:36:31 +01:00
Sandro La Bruzzo 098914dcff fix wrong relation with source null 2021-03-22 11:35:02 +01:00
miconis 0fe40b08e4 addition of deduplication profiles for the results, double check on pids and the title with a lower threshold 2021-03-19 17:12:05 +01:00
miconis 98854b0124 minor changes 2021-03-19 16:57:40 +01:00
Claudio Atzori 5a043e95ea code formatting 2021-03-19 11:37:27 +01:00
Claudio Atzori a4e82a65aa integrated filter applied when merging BETA & PROD graphs to rule our records from Datacite 2021-03-19 11:34:44 +01:00
Claudio Atzori 3256b9c836 code formatting 2021-03-19 09:36:12 +01:00
Claudio Atzori 75144dacb3 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-03-19 09:07:40 +01:00
Claudio Atzori 9588bfba81 [cleaning] entries avaialbe as PIDs must not appear as alternateIdentifier 2021-03-19 09:07:30 +01:00
Claudio Atzori 972d5a3d98 [dedup] Datacite should be authoritative for datasets 2021-03-19 09:04:20 +01:00