Commit Graph

68 Commits

Author SHA1 Message Date
Enrico Ottonello efe4c2a9c5 authors and works are now updated in two separate spark actions of the wf 2020-12-12 02:06:21 +01:00
Enrico Ottonello 858efbfad1 fix dataset creation for downloaded works 2020-12-11 16:49:54 +01:00
Enrico Ottonello 2233750a37 original orcid xml data are stored in a field of the class that models orcid data 2020-12-09 09:45:19 +01:00
Enrico Ottonello 5c65e602d3 wf doi_authors generates one json data foreach row 2020-12-07 15:28:10 +01:00
Enrico Ottonello fa1855a4b8 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-12-07 11:02:59 +01:00
Enrico Ottonello b1b589ada1 wf to generate orcid dataset 2020-12-07 11:02:32 +01:00
Sandro La Bruzzo b31dd126fb fixed crossref workflow added common ORCID Class 2020-12-07 10:42:38 +01:00
Enrico Ottonello 8812ab65e1 completed download function to wf; added accumulators 2020-12-04 21:13:49 +01:00
Enrico Ottonello 1b1e9ea67c wf to generate doi_author_list for doiboost; wf to download updated works 2020-12-02 23:20:16 +01:00
Enrico Ottonello f2df3ead74 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-11-30 14:22:46 +01:00
Sandro La Bruzzo ec3e238de6 Fixed problem on duplicated identifier 2020-11-25 17:15:54 +01:00
Enrico Ottonello 99a086f0c6 max concurrent executors set to 10, according to ORCID Director of Technology mail request 2020-11-24 17:49:32 +01:00
Enrico Ottonello 5c17e768b2 set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions 2020-11-23 16:01:23 +01:00
Enrico Ottonello 97c8111847 action to convert lambda file in seq file; spark action to download updated authors 2020-11-23 09:49:22 +01:00
Enrico Ottonello c0c2e05eae added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs 2020-11-17 18:23:12 +01:00
Enrico Ottonello 13f28fa225 moved AuthorData to dhp-schemas; added other names to author data 2020-11-12 17:43:32 +01:00
Enrico Ottonello 6bc7dbeca7 first version of dataset successful generated from orcid dump 2020 2020-11-06 13:47:50 +01:00
Enrico Ottonello 9818e74a70 added dependency version in main pom.xml for orcid no doi 2020-10-22 16:38:00 +02:00
Sandro La Bruzzo cd9c377d18 adpted scholexplorer Dump generation to the new Dataset definition 2020-10-08 10:10:13 +02:00
Enrico Ottonello c82b15b5f4 migrate configuration to ocean, fix publication dataset creation 2020-07-28 15:23:52 +02:00
Enrico Ottonello ca37d3427b separate workflow to parse orcid summaries, activities and generate dataset with no doi publications; test 2020-07-03 23:30:31 +02:00
Enrico Ottonello 5525f57ec8 converter from orcid work json to oaf 2020-07-01 18:36:14 +02:00
Enrico Ottonello b7b6be12a5 fixed enriched works generation 2020-06-29 18:03:16 +02:00
Enrico Ottonello b2213b6435 merged with dnet version 2020-06-26 17:27:34 +02:00
Enrico Ottonello c5e149c46e Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-06-26 16:15:38 +02:00
Enrico Ottonello d6498278ed added workflow to generate seq(orcidId,work) and seq(orcidId,enrichedWork) 2020-06-25 18:43:29 +02:00
Sandro La Bruzzo a6c0faac70 added test to verify secondary sorting 2020-06-25 10:48:15 +02:00
Sandro La Bruzzo 1d4275acc4 implemented first version of exportation of Scholexplorer into ActionSet 2020-06-17 09:10:38 +02:00
Alessia Bardi 4551c1082f mapping csv for orcid 2020-06-09 18:08:47 +02:00
Alessia Bardi a3a6755d58 mapping csv for Unpaywall 2020-06-09 17:45:44 +02:00
Alessia Bardi f3b033cf09 added csv line for funders from Crossref 2020-06-09 17:08:26 +02:00
Alessia Bardi 33b130ec43 Mapping instructions for MAG 2020-06-09 15:57:15 +02:00
Alessia Bardi 181f52b9bc Added mapping table for Crossref 2020-06-08 19:33:47 +02:00
Sandro La Bruzzo 7ac1ba2e35 improvement DOIBoost 2020-06-04 14:39:20 +02:00
Sandro La Bruzzo 13815d5d13 improvement DOIBoost 2020-06-01 17:52:12 +02:00
Sandro La Bruzzo b87b3ddb6b changed mapping ORCIDToOAF 2020-05-29 09:32:04 +02:00
Sandro La Bruzzo 7d29b61c62 code refactor 2020-05-28 09:57:46 +02:00
Sandro La Bruzzo 25f52e19a4 implemented generation of ActionSet 2020-05-26 09:15:33 +02:00
Sandro La Bruzzo 2408083566 implemented filtering step 2020-05-23 08:46:49 +02:00
Sandro La Bruzzo 147dd389bf minor fix 2020-05-22 20:51:42 +02:00
Sandro La Bruzzo 22936d0877 Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost 2020-05-22 15:15:17 +02:00
Sandro La Bruzzo 9fbb221457 completed mapping of UnpayWall and ORCID 2020-05-22 15:15:09 +02:00
Enrico Ottonello 1109d3b3fc Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost 2020-05-21 00:41:27 +02:00
Enrico Ottonello 869a53040e save to text file format 2020-05-21 00:41:21 +02:00
Sandro La Bruzzo b771d67e9d next step of MAG conversion implemented 2020-05-20 08:14:03 +02:00
Enrico Ottonello 934ad570e0 joined summaries and activities dataset 2020-05-19 12:57:21 +02:00
Enrico Ottonello ca722d4d18 merged 2020-05-19 09:43:12 +02:00
Enrico Ottonello 7362bc3e9d workflow to generate seq(doi,AuthorList) 2020-05-19 09:34:44 +02:00
Sandro La Bruzzo 486e850bcc next step of MAG conversion implemented 2020-05-19 09:24:45 +02:00
Enrico Ottonello d4e9075f22 Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost 2020-05-18 19:51:36 +02:00