Commit Graph

148 Commits

Author SHA1 Message Date
Sandro La Bruzzo d9a0bbda7b implemented new phase in doiboost to make the dataset Distinct by ID 2021-05-13 12:25:14 +02:00
Sandro La Bruzzo 54217d73ff removed old parameters from oozie workflow 2021-05-11 09:59:02 +02:00
Sandro La Bruzzo 7dc824fc23 imported changes in stable_id into master 2021-05-07 12:53:50 +02:00
Sandro La Bruzzo 1adfc41d23 merged manually changes on stable_id for doiboost into master 2021-05-05 10:23:32 +02:00
Claudio Atzori 7ed107be53 depending on external dhp-schemas module 2021-04-23 17:52:36 +02:00
Claudio Atzori ab2fe9266a [DOIBoost] minor fixes in workflow definition 2021-01-05 10:26:39 +01:00
Claudio Atzori 7c722f3fdc [DOIBoost] fixed typo 2021-01-05 10:25:54 +01:00
Claudio Atzori 8879704ba0 [DOIBoost] configurable ES server url and index name in crossref importer 2021-01-05 10:00:13 +01:00
Sandro La Bruzzo 7834a35768 avoid to save intermediate dataset before generation of Sequence file 2021-01-04 17:54:57 +01:00
Sandro La Bruzzo e79445a8b4 minor fix for claudio polemica 2021-01-04 17:39:25 +01:00
Sandro La Bruzzo 8765020b85 minor fix 2021-01-04 17:37:08 +01:00
Sandro La Bruzzo b0dc92786f defined a single oozie workflow for the generation of doiboost 2021-01-04 17:01:35 +01:00
Claudio Atzori 28460c2cd1 using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper 2020-12-23 16:59:52 +01:00
Sandro La Bruzzo 1f6c8a9e83 added orcid_pending type to records coming from Crossref 2020-12-15 11:47:15 +01:00
Sandro La Bruzzo 302baab67b fixed doiboost mapping and workflows 2020-12-07 19:59:33 +01:00
Sandro La Bruzzo b31dd126fb fixed crossref workflow added common ORCID Class 2020-12-07 10:42:38 +01:00
Sandro La Bruzzo 7da679542f fixed wrong projectId 2020-12-02 14:28:09 +01:00
Sandro La Bruzzo 6ba8037cc7 fixed failure to test due to changing of input 2020-12-02 11:34:46 +01:00
Claudio Atzori cfb55effd9 code formatting 2020-12-02 11:23:49 +01:00
Enrico Ottonello f2df3ead74 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-11-30 14:22:46 +01:00
Enrico Ottonello 40c4559e92 added datainfo on authors pid with "sysimport:crosswalk:entityregistry", 2020-11-30 14:19:22 +01:00
Claudio Atzori a104d2b6ad cleanup 2020-11-26 11:12:00 +01:00
Claudio Atzori db0181b8af Merge pull request 'added bidirectionality to relations from project and result coming from crossref' (#60) from miriam.baglioni/dnet-hadoop:sxBidirectionality into master 2020-11-25 17:17:40 +01:00
Sandro La Bruzzo ec3e238de6 Fixed problem on duplicated identifier 2020-11-25 17:15:54 +01:00
Sandro La Bruzzo 264723ffd8 updated stuff for zenodo upload 2020-11-25 11:56:07 +01:00
Enrico Ottonello 99a086f0c6 max concurrent executors set to 10, according to ORCID Director of Technology mail request 2020-11-24 17:49:32 +01:00
Miriam Baglioni 00874a8ce6 added bidirectionality to relations from project and result 2020-11-24 15:17:23 +01:00
Enrico Ottonello 5c17e768b2 set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions 2020-11-23 16:01:23 +01:00
Enrico Ottonello 97c8111847 action to convert lambda file in seq file; spark action to download updated authors 2020-11-23 09:49:22 +01:00
Enrico Ottonello c0c2e05eae added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs 2020-11-17 18:23:12 +01:00
Enrico Ottonello 005f849674 added compression to output dataset 2020-11-13 12:45:31 +01:00
Enrico Ottonello 9a2fa9dc2f added test for other names parsing from summaries dump 2020-11-13 10:25:34 +01:00
Enrico Ottonello 13f28fa225 moved AuthorData to dhp-schemas; added other names to author data 2020-11-12 17:43:32 +01:00
Enrico Ottonello 1f861f2b0d now wf output is a sequence file with the format seq("eu.dnetlib.dhp.schema.oaf.Publication",eu.dnetlib.dhp.schema.action.AtomicActions) 2020-11-11 17:38:50 +01:00
Enrico Ottonello fea2451658 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-11-10 11:49:43 +01:00
Enrico Ottonello 1513174d7e added further test case 2020-11-10 11:44:55 +01:00
Sandro La Bruzzo cd27df91a1 fixed bug on missing relation in ANDS 2020-11-06 17:12:31 +01:00
Enrico Ottonello 6bc7dbeca7 first version of dataset successful generated from orcid dump 2020 2020-11-06 13:47:50 +01:00
Sandro La Bruzzo 39337d8a8a fixed test 2020-11-02 09:26:25 +01:00
Enrico Ottonello 9818e74a70 added dependency version in main pom.xml for orcid no doi 2020-10-22 16:38:00 +02:00
Enrico Ottonello 210a50e4f4 replaced null value 2020-10-22 16:24:42 +02:00
Enrico Ottonello b0290dbcb7 moved all dependencies version to main pom.xml 2020-10-22 16:20:46 +02:00
Enrico Ottonello a38ab57062 let run test methods 2020-10-22 15:43:50 +02:00
Enrico Ottonello 1139d6568d replaced null value with a more safe empty string as return value 2020-10-22 15:32:26 +02:00
Enrico Ottonello c58db1c8ea added filter on null value after map function 2020-10-22 15:11:02 +02:00
Enrico Ottonello 846ba30873 if typologies mapping fails, an exception will be propagated 2020-10-22 14:36:18 +02:00
Enrico Ottonello c3114ba0ae replaced null as return value with a more safe empty string 2020-10-22 14:21:31 +02:00
Enrico Ottonello c295c71ca0 added comment 2020-10-22 14:07:26 +02:00
Enrico Ottonello ab083f9946 propagate exception on parsing work (PR request) 2020-10-22 14:02:32 +02:00
sandro 3a81a940b7 solved bug on merge publication 2020-10-21 22:41:55 +02:00