Enrico Ottonello
92a63f78fe
multiple download attempts handling if a connection to orcid server fails
2021-09-20 18:25:00 +02:00
Enrico Ottonello
8b804e7fe1
removed unused imports
2021-09-14 17:30:52 +02:00
Enrico Ottonello
aefa36c54b
other task executions go ahead if UnknownHostException happens on a single task
2021-09-14 17:26:15 +02:00
Claudio Atzori
2ee21da43b
suggestions from SonarLint
2021-08-11 12:13:22 +02:00
Sandro La Bruzzo
3d8e2aa146
Code refactor:
...
- removed old workflows in doiboost
- splitted workflow of doiboost in preprocess and process
2021-07-14 14:37:06 +02:00
Sandro La Bruzzo
c35c117601
fixed process doiboost workflow:
...
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 12:48:01 +02:00
Miriam Baglioni
0892cad4e8
the normalization of the content of value was not visible outside the block. Moved doi normalization operation while returning value
2021-07-05 16:21:42 +02:00
Miriam Baglioni
06074ea7d3
added normalization step to the doi
2021-06-29 18:46:08 +02:00
Miriam Baglioni
8d2e086e48
changes to avoid reassignment to val
2021-06-07 17:50:37 +02:00
Miriam Baglioni
f33521d338
Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
...
to be able to replace the aboject assigned to author val has been replaced by var
2021-06-07 17:27:07 +02:00
Miriam Baglioni
bc12e9819e
Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
...
The change is to fix the issue that arises when the same work appears more than once on the same ORCID profile. The change avoid to replicate the association doi -> author when the orcid id is already associated to the doi.
2021-06-07 16:37:01 +02:00
Claudio Atzori
23b8883ab1
applied intellij code cleanup
2021-05-14 10:58:12 +02:00
Sandro La Bruzzo
67085da305
fixed NPE
2021-04-16 11:05:58 +02:00
Sandro La Bruzzo
479abd10cb
Add into ORCID workflow a method that extracts orcid directly to the dump generated by Enrico
2021-04-13 17:47:43 +02:00
Claudio Atzori
e686b8de8d
[ORCID-no-doi] integrating PR#98 #98
2021-04-01 17:11:03 +02:00
Claudio Atzori
ee34cc51c3
[ORCID-no-doi] integrating PR#98 #98
2021-04-01 17:07:49 +02:00
Claudio Atzori
7941d7be29
WIP: using common definitions from ModelConstants
2021-03-31 18:33:57 +02:00
Sandro La Bruzzo
5f98ea74a9
Added fix for pid generation in stableIds
2021-03-17 15:53:24 +01:00
Claudio Atzori
8d2bb24512
merged from master
2021-03-08 15:44:34 +01:00
Claudio Atzori
28460c2cd1
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
2020-12-23 16:59:52 +01:00
Claudio Atzori
d9532446eb
imported more diffs from master branch; code formatting
2020-12-10 16:14:16 +01:00
Claudio Atzori
12e2f930c8
resolved conflicts
2020-12-10 10:57:39 +01:00
Sandro La Bruzzo
302baab67b
fixed doiboost mapping and workflows
2020-12-07 19:59:33 +01:00
Enrico Ottonello
99a086f0c6
max concurrent executors set to 10, according to ORCID Director of Technology mail request
2020-11-24 17:49:32 +01:00
Enrico Ottonello
5c17e768b2
set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions
2020-11-23 16:01:23 +01:00
Enrico Ottonello
97c8111847
action to convert lambda file in seq file; spark action to download updated authors
2020-11-23 09:49:22 +01:00
Enrico Ottonello
c0c2e05eae
added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs
2020-11-17 18:23:12 +01:00
Enrico Ottonello
13f28fa225
moved AuthorData to dhp-schemas; added other names to author data
2020-11-12 17:43:32 +01:00
Sandro La Bruzzo
8e1d43aab2
Implemented ID generation using IdentifierRecordFactory on DOIBoost
2020-11-09 11:53:55 +01:00
Enrico Ottonello
6bc7dbeca7
first version of dataset successful generated from orcid dump 2020
2020-11-06 13:47:50 +01:00
Enrico Ottonello
c295c71ca0
added comment
2020-10-22 14:07:26 +02:00
Enrico Ottonello
a97ad20c7b
exception is now propagated (PR review)
2020-09-22 10:46:34 +02:00
Enrico Ottonello
9e8e7fe6ef
add comments
2020-09-15 11:32:49 +02:00
Enrico Ottonello
ca37d3427b
separate workflow to parse orcid summaries, activities and generate dataset with no doi publications; test
2020-07-03 23:30:31 +02:00
Enrico Ottonello
b7b6be12a5
fixed enriched works generation
2020-06-29 18:03:16 +02:00
Enrico Ottonello
b2213b6435
merged with dnet version
2020-06-26 17:27:34 +02:00
Enrico Ottonello
d6498278ed
added workflow to generate seq(orcidId,work) and seq(orcidId,enrichedWork)
2020-06-25 18:43:29 +02:00
Enrico Ottonello
fcbb4c1489
parser of orcid publication data from xml original dump
2020-06-24 16:29:32 +02:00
Alessia Bardi
2d3f7d1eb4
fixed log classes to make the ORCID test run
2020-06-09 18:07:14 +02:00
Sandro La Bruzzo
b87b3ddb6b
changed mapping ORCIDToOAF
2020-05-29 09:32:04 +02:00
Sandro La Bruzzo
22936d0877
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
2020-05-22 15:15:17 +02:00
Sandro La Bruzzo
9fbb221457
completed mapping of UnpayWall and ORCID
2020-05-22 15:15:09 +02:00
Enrico Ottonello
869a53040e
save to text file format
2020-05-21 00:41:21 +02:00
Enrico Ottonello
934ad570e0
joined summaries and activities dataset
2020-05-19 12:57:21 +02:00
Enrico Ottonello
7362bc3e9d
workflow to generate seq(doi,AuthorList)
2020-05-19 09:34:44 +02:00
Enrico Ottonello
fc80e8c7de
added accumulator; last modified date of the record is added to saved data; lambda file is partitioned into 20 parts before starting downloading
2020-05-18 19:51:29 +02:00
Enrico Ottonello
0b29bb7e3b
spark job to download orcid record modified after a fixed date
2020-05-15 19:49:26 +02:00
Enrico Ottonello
08040cef80
spark action to analyze orcid lambda file
2020-05-12 16:57:43 +02:00
Enrico Ottonello
f53e42bda7
merged
2020-05-11 14:49:28 +02:00
Enrico Ottonello
7990894454
different date format in lambda file parsing
2020-05-11 14:41:11 +02:00