dnet-hadoop

Commit Graph

Author	SHA1	Message	Date
Enrico Ottonello	92a63f78fe	multiple download attempts handling if a connection to orcid server fails	2021-09-20 18:25:00 +02:00
Enrico Ottonello	8b804e7fe1	removed unused imports	2021-09-14 17:30:52 +02:00
Enrico Ottonello	aefa36c54b	other task executions go ahead if UnknownHostException happens on a single task	2021-09-14 17:26:15 +02:00
Claudio Atzori	2ee21da43b	suggestions from SonarLint	2021-08-11 12:13:22 +02:00
Sandro La Bruzzo	3d8e2aa146	Code refactor: - removed old workflows in doiboost - splitted workflow of doiboost in preprocess and process	2021-07-14 14:37:06 +02:00
Sandro La Bruzzo	c35c117601	fixed process doiboost workflow: - splitted OrcidToOAF into two phase preprocess and process - updated workflow used in production	2021-07-14 12:48:01 +02:00
Miriam Baglioni	0892cad4e8	the normalization of the content of value was not visible outside the block. Moved doi normalization operation while returning value	2021-07-05 16:21:42 +02:00
Miriam Baglioni	06074ea7d3	added normalization step to the doi	2021-06-29 18:46:08 +02:00
Miriam Baglioni	8d2e086e48	changes to avoid reassignment to val	2021-06-07 17:50:37 +02:00
Miriam Baglioni	f33521d338	Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala' to be able to replace the aboject assigned to author val has been replaced by var	2021-06-07 17:27:07 +02:00
Miriam Baglioni	bc12e9819e	Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala' The change is to fix the issue that arises when the same work appears more than once on the same ORCID profile. The change avoid to replicate the association doi -> author when the orcid id is already associated to the doi.	2021-06-07 16:37:01 +02:00
Claudio Atzori	23b8883ab1	applied intellij code cleanup	2021-05-14 10:58:12 +02:00
Sandro La Bruzzo	67085da305	fixed NPE	2021-04-16 11:05:58 +02:00
Sandro La Bruzzo	479abd10cb	Add into ORCID workflow a method that extracts orcid directly to the dump generated by Enrico	2021-04-13 17:47:43 +02:00
Claudio Atzori	e686b8de8d	[ORCID-no-doi] integrating PR#98 #98	2021-04-01 17:11:03 +02:00
Claudio Atzori	ee34cc51c3	[ORCID-no-doi] integrating PR#98 #98	2021-04-01 17:07:49 +02:00
Claudio Atzori	7941d7be29	WIP: using common definitions from ModelConstants	2021-03-31 18:33:57 +02:00
Sandro La Bruzzo	5f98ea74a9	Added fix for pid generation in stableIds	2021-03-17 15:53:24 +01:00
Claudio Atzori	8d2bb24512	merged from master	2021-03-08 15:44:34 +01:00
Claudio Atzori	28460c2cd1	using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper	2020-12-23 16:59:52 +01:00
Claudio Atzori	d9532446eb	imported more diffs from master branch; code formatting	2020-12-10 16:14:16 +01:00
Claudio Atzori	12e2f930c8	resolved conflicts	2020-12-10 10:57:39 +01:00
Sandro La Bruzzo	302baab67b	fixed doiboost mapping and workflows	2020-12-07 19:59:33 +01:00
Enrico Ottonello	99a086f0c6	max concurrent executors set to 10, according to ORCID Director of Technology mail request	2020-11-24 17:49:32 +01:00
Enrico Ottonello	5c17e768b2	set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions	2020-11-23 16:01:23 +01:00
Enrico Ottonello	97c8111847	action to convert lambda file in seq file; spark action to download updated authors	2020-11-23 09:49:22 +01:00
Enrico Ottonello	c0c2e05eae	added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs	2020-11-17 18:23:12 +01:00
Enrico Ottonello	13f28fa225	moved AuthorData to dhp-schemas; added other names to author data	2020-11-12 17:43:32 +01:00
Sandro La Bruzzo	8e1d43aab2	Implemented ID generation using IdentifierRecordFactory on DOIBoost	2020-11-09 11:53:55 +01:00
Enrico Ottonello	6bc7dbeca7	first version of dataset successful generated from orcid dump 2020	2020-11-06 13:47:50 +01:00
Enrico Ottonello	c295c71ca0	added comment	2020-10-22 14:07:26 +02:00
Enrico Ottonello	a97ad20c7b	exception is now propagated (PR review)	2020-09-22 10:46:34 +02:00
Enrico Ottonello	9e8e7fe6ef	add comments	2020-09-15 11:32:49 +02:00
Enrico Ottonello	ca37d3427b	separate workflow to parse orcid summaries, activities and generate dataset with no doi publications; test	2020-07-03 23:30:31 +02:00
Enrico Ottonello	b7b6be12a5	fixed enriched works generation	2020-06-29 18:03:16 +02:00
Enrico Ottonello	b2213b6435	merged with dnet version	2020-06-26 17:27:34 +02:00
Enrico Ottonello	d6498278ed	added workflow to generate seq(orcidId,work) and seq(orcidId,enrichedWork)	2020-06-25 18:43:29 +02:00
Enrico Ottonello	fcbb4c1489	parser of orcid publication data from xml original dump	2020-06-24 16:29:32 +02:00
Alessia Bardi	2d3f7d1eb4	fixed log classes to make the ORCID test run	2020-06-09 18:07:14 +02:00
Sandro La Bruzzo	b87b3ddb6b	changed mapping ORCIDToOAF	2020-05-29 09:32:04 +02:00
Sandro La Bruzzo	22936d0877	Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost	2020-05-22 15:15:17 +02:00
Sandro La Bruzzo	9fbb221457	completed mapping of UnpayWall and ORCID	2020-05-22 15:15:09 +02:00
Enrico Ottonello	869a53040e	save to text file format	2020-05-21 00:41:21 +02:00
Enrico Ottonello	934ad570e0	joined summaries and activities dataset	2020-05-19 12:57:21 +02:00
Enrico Ottonello	7362bc3e9d	workflow to generate seq(doi,AuthorList)	2020-05-19 09:34:44 +02:00
Enrico Ottonello	fc80e8c7de	added accumulator; last modified date of the record is added to saved data; lambda file is partitioned into 20 parts before starting downloading	2020-05-18 19:51:29 +02:00
Enrico Ottonello	0b29bb7e3b	spark job to download orcid record modified after a fixed date	2020-05-15 19:49:26 +02:00
Enrico Ottonello	08040cef80	spark action to analyze orcid lambda file	2020-05-12 16:57:43 +02:00
Enrico Ottonello	f53e42bda7	merged	2020-05-11 14:49:28 +02:00
Enrico Ottonello	7990894454	different date format in lambda file parsing	2020-05-11 14:41:11 +02:00

1 2

65 Commits