Commit Graph

165 Commits

Author SHA1 Message Date
Miriam Baglioni 8abdd9bad2 added step of normalization for the doi 2021-06-29 17:59:37 +02:00
Miriam Baglioni e1cd2e406e added class to test the normalization of dois 2021-06-29 17:55:03 +02:00
Miriam Baglioni 29828bc273 added resource to test the normalization of doi during the import of MAG 2021-06-29 17:54:14 +02:00
Miriam Baglioni 0f402a44fb slight modification of the resource to accomodate also doi normalization tests 2021-06-29 17:53:49 +02:00
Miriam Baglioni 2aa565ee6c added close command for the SparkContext 2021-06-29 17:53:10 +02:00
Miriam Baglioni d5d21254a2 added tests for the normalization of the dois 2021-06-29 17:50:07 +02:00
Miriam Baglioni 5e2f330239 added tests for the normalization of the dois 2021-06-29 17:49:39 +02:00
Miriam Baglioni 8320ad2248 added tests for the normalization of the dois 2021-06-29 17:49:11 +02:00
Miriam Baglioni 011e629df5 there is no more the need to lowerCase the doi since the normalized doi is saved in the first phase 2021-06-29 17:48:42 +02:00
Miriam Baglioni 22ae8a81c2 added the normalization step to the doi 2021-06-29 17:45:24 +02:00
Miriam Baglioni 779015e4a9 added the normalization step to the doi from crossref 2021-06-29 14:56:58 +02:00
Miriam Baglioni 3dd5701948 added the normalization step to the doi from crossref 2021-06-29 12:10:27 +02:00
Miriam Baglioni a5c1c0e90a added the normalization step to the doi from orcid before returning it 2021-06-29 12:03:54 +02:00
Miriam Baglioni dc5ed6f563 Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.) 2021-06-29 12:03:13 +02:00
Claudio Atzori c0d2b62e46 [doiboost] added missing implicit Encoder 2021-06-18 15:57:41 +02:00
Claudio Atzori a3948c1f6e cleanup old doiboost workflows 2021-06-18 15:14:08 +02:00
Claudio Atzori 6cbda49112 more pervasive use of constants from ModelConstants, especially for ORCID 2021-05-26 18:13:04 +02:00
Sandro La Bruzzo d9a0bbda7b implemented new phase in doiboost to make the dataset Distinct by ID 2021-05-13 12:25:14 +02:00
Sandro La Bruzzo 54217d73ff removed old parameters from oozie workflow 2021-05-11 09:59:02 +02:00
Sandro La Bruzzo 7dc824fc23 imported changes in stable_id into master 2021-05-07 12:53:50 +02:00
Sandro La Bruzzo 1adfc41d23 merged manually changes on stable_id for doiboost into master 2021-05-05 10:23:32 +02:00
Claudio Atzori 7ed107be53 depending on external dhp-schemas module 2021-04-23 17:52:36 +02:00
Claudio Atzori ab2fe9266a [DOIBoost] minor fixes in workflow definition 2021-01-05 10:26:39 +01:00
Claudio Atzori 7c722f3fdc [DOIBoost] fixed typo 2021-01-05 10:25:54 +01:00
Claudio Atzori 8879704ba0 [DOIBoost] configurable ES server url and index name in crossref importer 2021-01-05 10:00:13 +01:00
Sandro La Bruzzo 7834a35768 avoid to save intermediate dataset before generation of Sequence file 2021-01-04 17:54:57 +01:00
Sandro La Bruzzo e79445a8b4 minor fix for claudio polemica 2021-01-04 17:39:25 +01:00
Sandro La Bruzzo 8765020b85 minor fix 2021-01-04 17:37:08 +01:00
Sandro La Bruzzo b0dc92786f defined a single oozie workflow for the generation of doiboost 2021-01-04 17:01:35 +01:00
Claudio Atzori 28460c2cd1 using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper 2020-12-23 16:59:52 +01:00
Sandro La Bruzzo 1f6c8a9e83 added orcid_pending type to records coming from Crossref 2020-12-15 11:47:15 +01:00
Sandro La Bruzzo 302baab67b fixed doiboost mapping and workflows 2020-12-07 19:59:33 +01:00
Sandro La Bruzzo b31dd126fb fixed crossref workflow added common ORCID Class 2020-12-07 10:42:38 +01:00
Sandro La Bruzzo 7da679542f fixed wrong projectId 2020-12-02 14:28:09 +01:00
Sandro La Bruzzo 6ba8037cc7 fixed failure to test due to changing of input 2020-12-02 11:34:46 +01:00
Claudio Atzori cfb55effd9 code formatting 2020-12-02 11:23:49 +01:00
Enrico Ottonello f2df3ead74 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-11-30 14:22:46 +01:00
Enrico Ottonello 40c4559e92 added datainfo on authors pid with "sysimport:crosswalk:entityregistry", 2020-11-30 14:19:22 +01:00
Claudio Atzori a104d2b6ad cleanup 2020-11-26 11:12:00 +01:00
Claudio Atzori db0181b8af Merge pull request 'added bidirectionality to relations from project and result coming from crossref' (#60) from miriam.baglioni/dnet-hadoop:sxBidirectionality into master 2020-11-25 17:17:40 +01:00
Sandro La Bruzzo ec3e238de6 Fixed problem on duplicated identifier 2020-11-25 17:15:54 +01:00
Sandro La Bruzzo 264723ffd8 updated stuff for zenodo upload 2020-11-25 11:56:07 +01:00
Enrico Ottonello 99a086f0c6 max concurrent executors set to 10, according to ORCID Director of Technology mail request 2020-11-24 17:49:32 +01:00
Miriam Baglioni 00874a8ce6 added bidirectionality to relations from project and result 2020-11-24 15:17:23 +01:00
Enrico Ottonello 5c17e768b2 set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions 2020-11-23 16:01:23 +01:00
Enrico Ottonello 97c8111847 action to convert lambda file in seq file; spark action to download updated authors 2020-11-23 09:49:22 +01:00
Enrico Ottonello c0c2e05eae added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs 2020-11-17 18:23:12 +01:00
Enrico Ottonello 005f849674 added compression to output dataset 2020-11-13 12:45:31 +01:00
Enrico Ottonello 9a2fa9dc2f added test for other names parsing from summaries dump 2020-11-13 10:25:34 +01:00
Enrico Ottonello 13f28fa225 moved AuthorData to dhp-schemas; added other names to author data 2020-11-12 17:43:32 +01:00