dnet-hadoop

Commit Graph

Author	SHA1	Message	Date
Claudio Atzori	ee34cc51c3	[ORCID-no-doi] integrating PR#98 #98	2021-04-01 17:07:49 +02:00
Sandro La Bruzzo	cc5bbafa5d	some fix to make workflows runs	2021-03-17 12:12:56 +01:00
Claudio Atzori	ab2fe9266a	[DOIBoost] minor fixes in workflow definition	2021-01-05 10:26:39 +01:00
Claudio Atzori	7c722f3fdc	[DOIBoost] fixed typo	2021-01-05 10:25:54 +01:00
Claudio Atzori	8879704ba0	[DOIBoost] configurable ES server url and index name in crossref importer	2021-01-05 10:00:13 +01:00
Sandro La Bruzzo	e79445a8b4	minor fix for claudio polemica	2021-01-04 17:39:25 +01:00
Sandro La Bruzzo	8765020b85	minor fix	2021-01-04 17:37:08 +01:00
Sandro La Bruzzo	b0dc92786f	defined a single oozie workflow for the generation of doiboost	2021-01-04 17:01:35 +01:00
Sandro La Bruzzo	302baab67b	fixed doiboost mapping and workflows	2020-12-07 19:59:33 +01:00
Sandro La Bruzzo	b31dd126fb	fixed crossref workflow added common ORCID Class	2020-12-07 10:42:38 +01:00
Enrico Ottonello	f2df3ead74	Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi	2020-11-30 14:22:46 +01:00
Sandro La Bruzzo	ec3e238de6	Fixed problem on duplicated identifier	2020-11-25 17:15:54 +01:00
Enrico Ottonello	99a086f0c6	max concurrent executors set to 10, according to ORCID Director of Technology mail request	2020-11-24 17:49:32 +01:00
Enrico Ottonello	5c17e768b2	set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions	2020-11-23 16:01:23 +01:00
Enrico Ottonello	97c8111847	action to convert lambda file in seq file; spark action to download updated authors	2020-11-23 09:49:22 +01:00
Enrico Ottonello	c0c2e05eae	added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs	2020-11-17 18:23:12 +01:00
Enrico Ottonello	13f28fa225	moved AuthorData to dhp-schemas; added other names to author data	2020-11-12 17:43:32 +01:00
Enrico Ottonello	6bc7dbeca7	first version of dataset successful generated from orcid dump 2020	2020-11-06 13:47:50 +01:00
Enrico Ottonello	9818e74a70	added dependency version in main pom.xml for orcid no doi	2020-10-22 16:38:00 +02:00
Sandro La Bruzzo	cd9c377d18	adpted scholexplorer Dump generation to the new Dataset definition	2020-10-08 10:10:13 +02:00
Enrico Ottonello	c82b15b5f4	migrate configuration to ocean, fix publication dataset creation	2020-07-28 15:23:52 +02:00
Enrico Ottonello	ca37d3427b	separate workflow to parse orcid summaries, activities and generate dataset with no doi publications; test	2020-07-03 23:30:31 +02:00
Enrico Ottonello	5525f57ec8	converter from orcid work json to oaf	2020-07-01 18:36:14 +02:00
Enrico Ottonello	b7b6be12a5	fixed enriched works generation	2020-06-29 18:03:16 +02:00
Enrico Ottonello	b2213b6435	merged with dnet version	2020-06-26 17:27:34 +02:00
Enrico Ottonello	c5e149c46e	Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi	2020-06-26 16:15:38 +02:00
Enrico Ottonello	d6498278ed	added workflow to generate seq(orcidId,work) and seq(orcidId,enrichedWork)	2020-06-25 18:43:29 +02:00
Sandro La Bruzzo	a6c0faac70	added test to verify secondary sorting	2020-06-25 10:48:15 +02:00
Sandro La Bruzzo	1d4275acc4	implemented first version of exportation of Scholexplorer into ActionSet	2020-06-17 09:10:38 +02:00
Sandro La Bruzzo	7ac1ba2e35	improvement DOIBoost	2020-06-04 14:39:20 +02:00
Sandro La Bruzzo	13815d5d13	improvement DOIBoost	2020-06-01 17:52:12 +02:00
Sandro La Bruzzo	b87b3ddb6b	changed mapping ORCIDToOAF	2020-05-29 09:32:04 +02:00
Sandro La Bruzzo	7d29b61c62	code refactor	2020-05-28 09:57:46 +02:00
Sandro La Bruzzo	25f52e19a4	implemented generation of ActionSet	2020-05-26 09:15:33 +02:00
Sandro La Bruzzo	2408083566	implemented filtering step	2020-05-23 08:46:49 +02:00
Sandro La Bruzzo	147dd389bf	minor fix	2020-05-22 20:51:42 +02:00
Sandro La Bruzzo	22936d0877	Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost	2020-05-22 15:15:17 +02:00
Sandro La Bruzzo	9fbb221457	completed mapping of UnpayWall and ORCID	2020-05-22 15:15:09 +02:00
Enrico Ottonello	1109d3b3fc	Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost	2020-05-21 00:41:27 +02:00
Enrico Ottonello	869a53040e	save to text file format	2020-05-21 00:41:21 +02:00
Sandro La Bruzzo	b771d67e9d	next step of MAG conversion implemented	2020-05-20 08:14:03 +02:00
Enrico Ottonello	934ad570e0	joined summaries and activities dataset	2020-05-19 12:57:21 +02:00
Enrico Ottonello	ca722d4d18	merged	2020-05-19 09:43:12 +02:00
Enrico Ottonello	7362bc3e9d	workflow to generate seq(doi,AuthorList)	2020-05-19 09:34:44 +02:00
Sandro La Bruzzo	486e850bcc	next step of MAG conversion implemented	2020-05-19 09:24:45 +02:00
Enrico Ottonello	d4e9075f22	Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost	2020-05-18 19:51:36 +02:00
Enrico Ottonello	fc80e8c7de	added accumulator; last modified date of the record is added to saved data; lambda file is partitioned into 20 parts before starting downloading	2020-05-18 19:51:29 +02:00
Enrico Ottonello	0b29bb7e3b	spark job to download orcid record modified after a fixed date	2020-05-15 19:49:26 +02:00
Sandro La Bruzzo	d876f47d06	next step of MAG conversion implemented	2020-05-13 10:38:04 +02:00
Enrico Ottonello	08040cef80	spark action to analyze orcid lambda file	2020-05-12 16:57:43 +02:00

1 2

64 Commits