Commit Graph

306 Commits

Author SHA1 Message Date
Miriam Baglioni 441701c85c DoiBoost AccessRigh #4362 - If multiple licenses are available, take the one applied to 'vor' 2021-07-14 14:14:50 +02:00
Sandro La Bruzzo c35c117601 fixed process doiboost workflow:
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 12:48:01 +02:00
Sandro La Bruzzo bbe8193930 merged stable ids 2021-07-12 17:00:43 +02:00
Miriam Baglioni 7177c25261 added check for null value during doi normalization 2021-07-05 16:22:38 +02:00
Miriam Baglioni 0892cad4e8 the normalization of the content of value was not visible outside the block. Moved doi normalization operation while returning value 2021-07-05 16:21:42 +02:00
Sandro La Bruzzo c6fa8598e1 massive code refactor:
removed modules dhp-*-scholexplorer
2021-07-01 22:13:45 +02:00
Miriam Baglioni bc34347643 added assertions to verify doi normalization 2021-06-30 14:37:08 +02:00
Miriam Baglioni 86f47afcc7 slight modification of the resource to accomodate also doi normalization tests 2021-06-30 14:36:49 +02:00
Miriam Baglioni 03767ea8e6 slight modification of the resource to accomodate also doi normalization tests 2021-06-30 13:21:24 +02:00
Miriam Baglioni f8eec0ca9a added resource to test the normalization of doi during the import of MAG 2021-06-30 13:19:54 +02:00
Miriam Baglioni 149f85ddf5 added tests for the normalization of the dois 2021-06-30 13:00:52 +02:00
Miriam Baglioni e487b5544c added tests for the normalization of the dois 2021-06-30 12:57:11 +02:00
Miriam Baglioni 1503ccbbb5 added tests for the normalization of the dois 2021-06-30 12:55:37 +02:00
Miriam Baglioni 1299bfb357 Added class to test the normalization of doi 2021-06-30 12:53:27 +02:00
Miriam Baglioni cf758f4f91 added normalization step for the doi 2021-06-30 10:03:15 +02:00
Miriam Baglioni 801763a0fa there is no more the need to lower case the doi since it is done in the first step. Also changed the creation of the id by using the factory 2021-06-29 19:07:23 +02:00
Miriam Baglioni a74de1cda2 added normalization step to the doi 2021-06-29 18:51:11 +02:00
Miriam Baglioni 06074ea7d3 added normalization step to the doi 2021-06-29 18:46:08 +02:00
Miriam Baglioni 8b8ffe82dc added step of normalization for the doi 2021-06-29 18:41:39 +02:00
Miriam Baglioni 50cc21d92e Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.) 2021-06-29 18:35:28 +02:00
Sandro La Bruzzo 80e15cc455 implemented mapping from uniprot, pdb and ebi links 2021-06-24 17:20:00 +02:00
Sandro La Bruzzo a167543637 Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer 2021-06-21 09:14:11 +02:00
Miriam Baglioni 13c96622c9 - 2021-06-18 09:45:16 +02:00
Miriam Baglioni b486ae498f added test and test resource to verify the generation of the date of acceptance from the input extracted from the dump 2021-06-18 09:43:32 +02:00
Miriam Baglioni 464c2ddde3 changed to split in two steps the generation of the crossref dataset 2021-06-18 09:42:31 +02:00
Miriam Baglioni 6aca0d8ebb added kryo encoding for input files 2021-06-18 09:42:07 +02:00
Miriam Baglioni 3585e53da3 changed to split in two steps the generation of the crossref dataset 2021-06-18 09:41:23 +02:00
Sandro La Bruzzo 3100166d29 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-06-16 16:22:16 +02:00
Miriam Baglioni 95885bcf12 forces executor Executor memory and driver executor memory to be 7G (trying to avoid OOM) 2021-06-16 10:17:52 +02:00
Miriam Baglioni 2550a73981 - 2021-06-16 10:04:41 +02:00
Miriam Baglioni 1c47c0d786 modified the number of executors trying to avoid OOM exception 2021-06-15 21:05:39 +02:00
Miriam Baglioni 7deac55138 added one option for resume from in the wf 2021-06-15 18:38:20 +02:00
Miriam Baglioni 66e7ef892f changed the parameter name 2021-06-15 11:08:54 +02:00
Miriam Baglioni 4f47ad0891 no need to rename the folders, just write in overwrite mode, so I changed the name of the output folder 2021-06-15 09:28:31 +02:00
Miriam Baglioni 9f9dd00b94 refactoring 2021-06-15 09:24:46 +02:00
Miriam Baglioni 63d74ee379 refactoring 2021-06-15 09:24:11 +02:00
Miriam Baglioni 6ebc236657 added needed property: outputPath 2021-06-15 09:23:24 +02:00
Miriam Baglioni f7379255b6 changed the workflow to extract info from the dump 2021-06-15 09:22:54 +02:00
Miriam Baglioni d6e21bb6ea creates the crossref dataset used for doiboost together with unpacking part from tar 2021-06-14 17:27:19 +02:00
Miriam Baglioni ce0cfd79e0 creates the crossref dataset used for doiboost 2021-06-14 13:40:19 +02:00
Miriam Baglioni 93efe4de82 split the construction of crossref dataset in two parts. This one just unpacks the tar entries 2021-06-14 13:39:40 +02:00
Miriam Baglioni 8873e6b6d1 workflow and parameter 2021-06-14 10:15:57 +02:00
Miriam Baglioni 0f1acdf6b6 workflow and parameter 2021-06-14 10:08:55 +02:00
Sandro La Bruzzo efbea1e01a minor fix 2021-06-14 09:45:14 +02:00
Miriam Baglioni 75780fc636 extraction of the tar for the dump of crossref, and creation of the dataset 2021-06-14 09:45:07 +02:00
Miriam Baglioni 8d2e086e48 changes to avoid reassignment to val 2021-06-07 17:50:37 +02:00
Miriam Baglioni f33521d338 Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
to be able to replace the aboject assigned to author val has been replaced by var
2021-06-07 17:27:07 +02:00
Miriam Baglioni bc12e9819e Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
The change is to fix the issue that arises when the same work appears more than once on the same ORCID profile. The change avoid to replicate the association doi -> author when the orcid id is already associated to the doi.
2021-06-07 16:37:01 +02:00
Claudio Atzori 5e4b91d9ef more pervasive use of constants from ModelConstants, especially for ORCID 2021-05-26 18:20:23 +02:00
Claudio Atzori c4a23c2f4d fix: preserving the old identifier among the originalIds in the doiboost construction process, trying to avoid UnsupportedOperationException while adding elements to the originalIds 2021-05-19 16:01:52 +02:00
Claudio Atzori ba03f549d7 fix: preserving the old identifier among the originalIds in the doiboost construction process 2021-05-19 15:43:26 +02:00
Claudio Atzori 2cbf15f4fb using ModelConstants 2021-05-17 09:54:45 +02:00
Claudio Atzori f19feceaf0 set the old identifier before switching to the new one 2021-05-14 12:53:40 +02:00
Claudio Atzori 1bd70fa2c6 preserving the old identifier among the originalIds in the doiboost construction process 2021-05-14 11:30:41 +02:00
Claudio Atzori ca3f3a7687 using ModelConstants 2021-05-14 11:29:49 +02:00
Claudio Atzori 23b8883ab1 applied intellij code cleanup 2021-05-14 10:58:12 +02:00
Enrico Ottonello c537986b7c deleted folders with merged data immediately before merge phases 2021-04-28 11:25:25 +02:00
Claudio Atzori 5afa7d3e0c core utilities in dhp-common moved in external module dhp-schemas 2021-04-27 15:44:01 +02:00
Claudio Atzori 27ab8a704d adjusted poms to align with the external dhp-schema module 2021-04-27 10:12:27 +02:00
Claudio Atzori c2bb03c8b5 depending on external dhp-schemas module 2021-04-23 17:57:35 +02:00
Claudio Atzori e5abbec2ba [orcid] download of the lambda file defined in a script 2021-04-22 11:22:10 +02:00
Claudio Atzori 55964cbd81 [orcid] large oozie workflow cleanup; updated workflow for the orcidnodoi actionset creation 2021-04-22 10:18:09 +02:00
Claudio Atzori 52244f813a merging from enrico.ottonello/dnet-hadoop:orcid-no-doi 2021-04-21 12:24:09 +02:00
Sandro La Bruzzo a16e5299f9 applied unique function on the final dataset 2021-04-16 17:36:48 +02:00
Enrico Ottonello 27068aacd1 wf to move orcid-no-doi dataset on the folder ready the import 2021-04-16 17:17:47 +02:00
Sandro La Bruzzo 67085da305 fixed NPE 2021-04-16 11:05:58 +02:00
Sandro La Bruzzo 7d6a80e2f2 added new type on MAG mapping 2021-04-16 09:14:15 +02:00
Sandro La Bruzzo 3f77bfceb0 fixed test failure on jenkins 2021-04-14 10:03:01 +02:00
Sandro La Bruzzo 479abd10cb Add into ORCID workflow a method that extracts orcid directly to the dump generated by Enrico 2021-04-13 17:47:43 +02:00
Claudio Atzori e686b8de8d [ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98 2021-04-01 17:11:03 +02:00
Claudio Atzori ee34cc51c3 [ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98 2021-04-01 17:07:49 +02:00
Claudio Atzori 7941d7be29 WIP: using common definitions from ModelConstants 2021-03-31 18:33:57 +02:00
Enrico Ottonello 59ec5137e1 improvement related to https://issue.openaire.research-infrastructures.eu/issues/6501 2021-03-31 16:25:41 +02:00
Sandro La Bruzzo 616d2ecce2 splitted workflow collecting datacite into two workflows.
Released on beta
2021-03-31 15:45:58 +02:00
Sandro La Bruzzo 1dfda3624e improved workflow importing datacite 2021-03-26 13:56:29 +01:00
Enrico Ottonello ebd67b8c8f removed duplicates orcid data on authors set 2021-03-25 11:20:52 +01:00
Sandro La Bruzzo 625e4c29c4 added model constants 2021-03-23 09:39:56 +01:00
Sandro La Bruzzo c392936b97 fixed error on best access right 2021-03-23 09:23:22 +01:00
Sandro La Bruzzo c73072079d fix conflicts 2021-03-22 16:36:31 +01:00
Sandro La Bruzzo 098914dcff fix wrong relation with source null 2021-03-22 11:35:02 +01:00
Sandro La Bruzzo 25d5663d97 added filter 2021-03-18 10:24:42 +01:00
Sandro La Bruzzo 5f98ea74a9 Added fix for pid generation in stableIds 2021-03-17 15:53:24 +01:00
Sandro La Bruzzo cc5bbafa5d some fix to make workflows runs 2021-03-17 12:12:56 +01:00
Sandro La Bruzzo 4bb3bcafa5 add author sequence number 2021-03-11 11:32:32 +01:00
Sandro La Bruzzo a8e5d0ea0d updated test and fixed assign of access right 2021-03-11 10:41:24 +01:00
Sandro La Bruzzo f5e7c57654 Fixed ticket 6282 2021-03-11 10:32:45 +01:00
Claudio Atzori d525785497 [#6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color. 2021-03-09 11:12:55 +01:00
Sandro La Bruzzo a2169ccf07 // implemented Ticket #6281 added pid to Instance in doiBoost 2021-03-09 10:46:36 +01:00
Claudio Atzori 8d2bb24512 merged from master 2021-03-08 15:44:34 +01:00
Enrico Ottonello 70cb100647 added updating last orcid dataset folders after completion 2021-03-01 10:17:04 +01:00
Enrico Ottonello bd3b16402b added result typologies 2021-03-01 10:16:02 +01:00
Enrico Ottonello 53d7023460 dateOfCollection taken from orcid last_update.txt on hdfs; cleaned wf parameters 2021-02-25 18:43:29 +01:00
Enrico Ottonello d43ea88caf aligned orcid result typologies with openaire vocabulary 2021-02-25 15:02:10 +01:00
Enrico Ottonello 975823b968 data from last updated orcid 2021-02-23 15:35:04 +01:00
Enrico Ottonello ee4ba7298b fix last update read/write from file on hdfs 2021-02-09 23:24:57 +01:00
Claudio Atzori 72c57b28fa switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT 2021-02-04 14:08:18 +01:00
Enrico Ottonello c238561001 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2021-02-04 10:44:21 +01:00
Enrico Ottonello 465ce39f75 job execution now based on file last_update.txt on hdfs 2021-02-04 10:44:04 +01:00
Sandro La Bruzzo 99cf3a8ea4 Merged Datacite transfrom into this branch 2021-01-28 16:34:46 +01:00
Claudio Atzori ab2fe9266a [DOIBoost] minor fixes in workflow definition 2021-01-05 10:26:39 +01:00