Miriam Baglioni
|
8abdd9bad2
|
added step of normalization for the doi
|
2021-06-29 17:59:37 +02:00 |
Miriam Baglioni
|
e1cd2e406e
|
added class to test the normalization of dois
|
2021-06-29 17:55:03 +02:00 |
Miriam Baglioni
|
29828bc273
|
added resource to test the normalization of doi during the import of MAG
|
2021-06-29 17:54:14 +02:00 |
Miriam Baglioni
|
0f402a44fb
|
slight modification of the resource to accomodate also doi normalization tests
|
2021-06-29 17:53:49 +02:00 |
Miriam Baglioni
|
2aa565ee6c
|
added close command for the SparkContext
|
2021-06-29 17:53:10 +02:00 |
Miriam Baglioni
|
d5d21254a2
|
added tests for the normalization of the dois
|
2021-06-29 17:50:07 +02:00 |
Miriam Baglioni
|
5e2f330239
|
added tests for the normalization of the dois
|
2021-06-29 17:49:39 +02:00 |
Miriam Baglioni
|
8320ad2248
|
added tests for the normalization of the dois
|
2021-06-29 17:49:11 +02:00 |
Miriam Baglioni
|
011e629df5
|
there is no more the need to lowerCase the doi since the normalized doi is saved in the first phase
|
2021-06-29 17:48:42 +02:00 |
Miriam Baglioni
|
22ae8a81c2
|
added the normalization step to the doi
|
2021-06-29 17:45:24 +02:00 |
Miriam Baglioni
|
779015e4a9
|
added the normalization step to the doi from crossref
|
2021-06-29 14:56:58 +02:00 |
Miriam Baglioni
|
3dd5701948
|
added the normalization step to the doi from crossref
|
2021-06-29 12:10:27 +02:00 |
Miriam Baglioni
|
a5c1c0e90a
|
added the normalization step to the doi from orcid before returning it
|
2021-06-29 12:03:54 +02:00 |
Miriam Baglioni
|
dc5ed6f563
|
Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.)
|
2021-06-29 12:03:13 +02:00 |
Claudio Atzori
|
c0d2b62e46
|
[doiboost] added missing implicit Encoder
|
2021-06-18 15:57:41 +02:00 |
Claudio Atzori
|
a3948c1f6e
|
cleanup old doiboost workflows
|
2021-06-18 15:14:08 +02:00 |
Claudio Atzori
|
6cbda49112
|
more pervasive use of constants from ModelConstants, especially for ORCID
|
2021-05-26 18:13:04 +02:00 |
Sandro La Bruzzo
|
d9a0bbda7b
|
implemented new phase in doiboost to make the dataset Distinct by ID
|
2021-05-13 12:25:14 +02:00 |
Sandro La Bruzzo
|
54217d73ff
|
removed old parameters from oozie workflow
|
2021-05-11 09:59:02 +02:00 |
Sandro La Bruzzo
|
7dc824fc23
|
imported changes in stable_id into master
|
2021-05-07 12:53:50 +02:00 |
Sandro La Bruzzo
|
1adfc41d23
|
merged manually changes on stable_id for doiboost into master
|
2021-05-05 10:23:32 +02:00 |
Claudio Atzori
|
7ed107be53
|
depending on external dhp-schemas module
|
2021-04-23 17:52:36 +02:00 |
Claudio Atzori
|
ab2fe9266a
|
[DOIBoost] minor fixes in workflow definition
|
2021-01-05 10:26:39 +01:00 |
Claudio Atzori
|
7c722f3fdc
|
[DOIBoost] fixed typo
|
2021-01-05 10:25:54 +01:00 |
Claudio Atzori
|
8879704ba0
|
[DOIBoost] configurable ES server url and index name in crossref importer
|
2021-01-05 10:00:13 +01:00 |
Sandro La Bruzzo
|
7834a35768
|
avoid to save intermediate dataset before generation of Sequence file
|
2021-01-04 17:54:57 +01:00 |
Sandro La Bruzzo
|
e79445a8b4
|
minor fix for claudio polemica
|
2021-01-04 17:39:25 +01:00 |
Sandro La Bruzzo
|
8765020b85
|
minor fix
|
2021-01-04 17:37:08 +01:00 |
Sandro La Bruzzo
|
b0dc92786f
|
defined a single oozie workflow for the generation of doiboost
|
2021-01-04 17:01:35 +01:00 |
Claudio Atzori
|
28460c2cd1
|
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
|
2020-12-23 16:59:52 +01:00 |
Sandro La Bruzzo
|
1f6c8a9e83
|
added orcid_pending type to records coming from Crossref
|
2020-12-15 11:47:15 +01:00 |
Sandro La Bruzzo
|
302baab67b
|
fixed doiboost mapping and workflows
|
2020-12-07 19:59:33 +01:00 |
Sandro La Bruzzo
|
b31dd126fb
|
fixed crossref workflow added common ORCID Class
|
2020-12-07 10:42:38 +01:00 |
Sandro La Bruzzo
|
7da679542f
|
fixed wrong projectId
|
2020-12-02 14:28:09 +01:00 |
Sandro La Bruzzo
|
6ba8037cc7
|
fixed failure to test due to changing of input
|
2020-12-02 11:34:46 +01:00 |
Claudio Atzori
|
cfb55effd9
|
code formatting
|
2020-12-02 11:23:49 +01:00 |
Enrico Ottonello
|
f2df3ead74
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-30 14:22:46 +01:00 |
Enrico Ottonello
|
40c4559e92
|
added datainfo on authors pid with "sysimport:crosswalk:entityregistry",
|
2020-11-30 14:19:22 +01:00 |
Claudio Atzori
|
a104d2b6ad
|
cleanup
|
2020-11-26 11:12:00 +01:00 |
Claudio Atzori
|
db0181b8af
|
Merge pull request 'added bidirectionality to relations from project and result coming from crossref' (#60) from miriam.baglioni/dnet-hadoop:sxBidirectionality into master
|
2020-11-25 17:17:40 +01:00 |
Sandro La Bruzzo
|
ec3e238de6
|
Fixed problem on duplicated identifier
|
2020-11-25 17:15:54 +01:00 |
Sandro La Bruzzo
|
264723ffd8
|
updated stuff for zenodo upload
|
2020-11-25 11:56:07 +01:00 |
Enrico Ottonello
|
99a086f0c6
|
max concurrent executors set to 10, according to ORCID Director of Technology mail request
|
2020-11-24 17:49:32 +01:00 |
Miriam Baglioni
|
00874a8ce6
|
added bidirectionality to relations from project and result
|
2020-11-24 15:17:23 +01:00 |
Enrico Ottonello
|
5c17e768b2
|
set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions
|
2020-11-23 16:01:23 +01:00 |
Enrico Ottonello
|
97c8111847
|
action to convert lambda file in seq file; spark action to download updated authors
|
2020-11-23 09:49:22 +01:00 |
Enrico Ottonello
|
c0c2e05eae
|
added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs
|
2020-11-17 18:23:12 +01:00 |
Enrico Ottonello
|
005f849674
|
added compression to output dataset
|
2020-11-13 12:45:31 +01:00 |
Enrico Ottonello
|
9a2fa9dc2f
|
added test for other names parsing from summaries dump
|
2020-11-13 10:25:34 +01:00 |
Enrico Ottonello
|
13f28fa225
|
moved AuthorData to dhp-schemas; added other names to author data
|
2020-11-12 17:43:32 +01:00 |