dnet-hadoop

Commit Graph

Author	SHA1	Message	Date
Miriam Baglioni	8abdd9bad2	added step of normalization for the doi	2021-06-29 17:59:37 +02:00
Miriam Baglioni	e1cd2e406e	added class to test the normalization of dois	2021-06-29 17:55:03 +02:00
Miriam Baglioni	29828bc273	added resource to test the normalization of doi during the import of MAG	2021-06-29 17:54:14 +02:00
Miriam Baglioni	0f402a44fb	slight modification of the resource to accomodate also doi normalization tests	2021-06-29 17:53:49 +02:00
Miriam Baglioni	2aa565ee6c	added close command for the SparkContext	2021-06-29 17:53:10 +02:00
Miriam Baglioni	d5d21254a2	added tests for the normalization of the dois	2021-06-29 17:50:07 +02:00
Miriam Baglioni	5e2f330239	added tests for the normalization of the dois	2021-06-29 17:49:39 +02:00
Miriam Baglioni	8320ad2248	added tests for the normalization of the dois	2021-06-29 17:49:11 +02:00
Miriam Baglioni	011e629df5	there is no more the need to lowerCase the doi since the normalized doi is saved in the first phase	2021-06-29 17:48:42 +02:00
Miriam Baglioni	22ae8a81c2	added the normalization step to the doi	2021-06-29 17:45:24 +02:00
Miriam Baglioni	779015e4a9	added the normalization step to the doi from crossref	2021-06-29 14:56:58 +02:00
Miriam Baglioni	3dd5701948	added the normalization step to the doi from crossref	2021-06-29 12:10:27 +02:00
Miriam Baglioni	a5c1c0e90a	added the normalization step to the doi from orcid before returning it	2021-06-29 12:03:54 +02:00
Miriam Baglioni	dc5ed6f563	Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.)	2021-06-29 12:03:13 +02:00
Claudio Atzori	c0d2b62e46	[doiboost] added missing implicit Encoder	2021-06-18 15:57:41 +02:00
Claudio Atzori	a3948c1f6e	cleanup old doiboost workflows	2021-06-18 15:14:08 +02:00
Claudio Atzori	6cbda49112	more pervasive use of constants from ModelConstants, especially for ORCID	2021-05-26 18:13:04 +02:00
Sandro La Bruzzo	d9a0bbda7b	implemented new phase in doiboost to make the dataset Distinct by ID	2021-05-13 12:25:14 +02:00
Sandro La Bruzzo	54217d73ff	removed old parameters from oozie workflow	2021-05-11 09:59:02 +02:00
Sandro La Bruzzo	7dc824fc23	imported changes in stable_id into master	2021-05-07 12:53:50 +02:00
Sandro La Bruzzo	1adfc41d23	merged manually changes on stable_id for doiboost into master	2021-05-05 10:23:32 +02:00
Claudio Atzori	7ed107be53	depending on external dhp-schemas module	2021-04-23 17:52:36 +02:00
Claudio Atzori	ab2fe9266a	[DOIBoost] minor fixes in workflow definition	2021-01-05 10:26:39 +01:00
Claudio Atzori	7c722f3fdc	[DOIBoost] fixed typo	2021-01-05 10:25:54 +01:00
Claudio Atzori	8879704ba0	[DOIBoost] configurable ES server url and index name in crossref importer	2021-01-05 10:00:13 +01:00
Sandro La Bruzzo	7834a35768	avoid to save intermediate dataset before generation of Sequence file	2021-01-04 17:54:57 +01:00
Sandro La Bruzzo	e79445a8b4	minor fix for claudio polemica	2021-01-04 17:39:25 +01:00
Sandro La Bruzzo	8765020b85	minor fix	2021-01-04 17:37:08 +01:00
Sandro La Bruzzo	b0dc92786f	defined a single oozie workflow for the generation of doiboost	2021-01-04 17:01:35 +01:00
Claudio Atzori	28460c2cd1	using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper	2020-12-23 16:59:52 +01:00
Sandro La Bruzzo	1f6c8a9e83	added orcid_pending type to records coming from Crossref	2020-12-15 11:47:15 +01:00
Sandro La Bruzzo	302baab67b	fixed doiboost mapping and workflows	2020-12-07 19:59:33 +01:00
Sandro La Bruzzo	b31dd126fb	fixed crossref workflow added common ORCID Class	2020-12-07 10:42:38 +01:00
Sandro La Bruzzo	7da679542f	fixed wrong projectId	2020-12-02 14:28:09 +01:00
Sandro La Bruzzo	6ba8037cc7	fixed failure to test due to changing of input	2020-12-02 11:34:46 +01:00
Claudio Atzori	cfb55effd9	code formatting	2020-12-02 11:23:49 +01:00
Enrico Ottonello	f2df3ead74	Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi	2020-11-30 14:22:46 +01:00
Enrico Ottonello	40c4559e92	added datainfo on authors pid with "sysimport:crosswalk:entityregistry",	2020-11-30 14:19:22 +01:00
Claudio Atzori	a104d2b6ad	cleanup	2020-11-26 11:12:00 +01:00
Claudio Atzori	db0181b8af	Merge pull request 'added bidirectionality to relations from project and result coming from crossref' (#60 ) from miriam.baglioni/dnet-hadoop:sxBidirectionality into master	2020-11-25 17:17:40 +01:00
Sandro La Bruzzo	ec3e238de6	Fixed problem on duplicated identifier	2020-11-25 17:15:54 +01:00
Sandro La Bruzzo	264723ffd8	updated stuff for zenodo upload	2020-11-25 11:56:07 +01:00
Enrico Ottonello	99a086f0c6	max concurrent executors set to 10, according to ORCID Director of Technology mail request	2020-11-24 17:49:32 +01:00
Miriam Baglioni	00874a8ce6	added bidirectionality to relations from project and result	2020-11-24 15:17:23 +01:00
Enrico Ottonello	5c17e768b2	set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions	2020-11-23 16:01:23 +01:00
Enrico Ottonello	97c8111847	action to convert lambda file in seq file; spark action to download updated authors	2020-11-23 09:49:22 +01:00
Enrico Ottonello	c0c2e05eae	added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs	2020-11-17 18:23:12 +01:00
Enrico Ottonello	005f849674	added compression to output dataset	2020-11-13 12:45:31 +01:00
Enrico Ottonello	9a2fa9dc2f	added test for other names parsing from summaries dump	2020-11-13 10:25:34 +01:00
Enrico Ottonello	13f28fa225	moved AuthorData to dhp-schemas; added other names to author data	2020-11-12 17:43:32 +01:00

1 2 3 4

165 Commits