Enrico Ottonello
|
ca37d3427b
|
separate workflow to parse orcid summaries, activities and generate dataset with no doi publications; test
|
2020-07-03 23:30:31 +02:00 |
Enrico Ottonello
|
b7b6be12a5
|
fixed enriched works generation
|
2020-06-29 18:03:16 +02:00 |
Enrico Ottonello
|
b2213b6435
|
merged with dnet version
|
2020-06-26 17:27:34 +02:00 |
Enrico Ottonello
|
d6498278ed
|
added workflow to generate seq(orcidId,work) and seq(orcidId,enrichedWork)
|
2020-06-25 18:43:29 +02:00 |
Enrico Ottonello
|
fcbb4c1489
|
parser of orcid publication data from xml original dump
|
2020-06-24 16:29:32 +02:00 |
Alessia Bardi
|
2d3f7d1eb4
|
fixed log classes to make the ORCID test run
|
2020-06-09 18:07:14 +02:00 |
Sandro La Bruzzo
|
b87b3ddb6b
|
changed mapping ORCIDToOAF
|
2020-05-29 09:32:04 +02:00 |
Sandro La Bruzzo
|
22936d0877
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-05-22 15:15:17 +02:00 |
Sandro La Bruzzo
|
9fbb221457
|
completed mapping of UnpayWall and ORCID
|
2020-05-22 15:15:09 +02:00 |
Enrico Ottonello
|
869a53040e
|
save to text file format
|
2020-05-21 00:41:21 +02:00 |
Enrico Ottonello
|
934ad570e0
|
joined summaries and activities dataset
|
2020-05-19 12:57:21 +02:00 |
Enrico Ottonello
|
7362bc3e9d
|
workflow to generate seq(doi,AuthorList)
|
2020-05-19 09:34:44 +02:00 |
Enrico Ottonello
|
fc80e8c7de
|
added accumulator; last modified date of the record is added to saved data; lambda file is partitioned into 20 parts before starting downloading
|
2020-05-18 19:51:29 +02:00 |
Enrico Ottonello
|
0b29bb7e3b
|
spark job to download orcid record modified after a fixed date
|
2020-05-15 19:49:26 +02:00 |
Enrico Ottonello
|
08040cef80
|
spark action to analyze orcid lambda file
|
2020-05-12 16:57:43 +02:00 |
Enrico Ottonello
|
f53e42bda7
|
merged
|
2020-05-11 14:49:28 +02:00 |
Enrico Ottonello
|
7990894454
|
different date format in lambda file parsing
|
2020-05-11 14:41:11 +02:00 |
Sandro La Bruzzo
|
0c6774e4da
|
updated pom version
|
2020-05-11 14:35:14 +02:00 |
Enrico Ottonello
|
b9d126dd1f
|
formatting modified after commit
|
2020-05-08 14:54:37 +02:00 |
Enrico Ottonello
|
7e1c987370
|
Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost
|
2020-05-08 14:49:50 +02:00 |
Enrico Ottonello
|
9d812788e4
|
added job to download from orcid the records modified after a fixed date, the info are taken from last_modified.csv on hdfs
|
2020-05-08 14:49:39 +02:00 |
Sandro La Bruzzo
|
4a89465740
|
reformatted code
|
2020-04-29 13:24:29 +02:00 |
Sandro La Bruzzo
|
a6b1a59d0a
|
merged with maaster
|
2020-04-29 13:20:57 +02:00 |
Enrico Ottonello
|
1edcd53581
|
added shell actions to download all 11 activities files from ORCID
|
2020-04-28 20:25:09 +02:00 |
Enrico Ottonello
|
a1861b9eaa
|
workflow works in parallel on 2 activity files
|
2020-04-24 18:33:37 +02:00 |
Enrico Ottonello
|
941e94af06
|
added workflow for generating authors with dois data sequence file
|
2020-04-24 15:50:40 +02:00 |
Enrico Ottonello
|
4a6aea1a37
|
fix formtat problem
|
2020-04-23 15:25:39 +02:00 |
Enrico Ottonello
|
7d759947ae
|
used vtd for parsing orcid xml record, set 4g heapspace
|
2020-04-22 14:41:19 +02:00 |
Enrico Ottonello
|
a466648b4b
|
renamed output file
|
2020-04-20 12:32:03 +02:00 |
Enrico Ottonello
|
4ae55e3891
|
added workflow parameters
|
2020-04-20 12:00:04 +02:00 |
Sandro La Bruzzo
|
eef60bb9f4
|
created structure of oozie wf for ORCID
|
2020-04-20 10:24:57 +02:00 |
Sandro La Bruzzo
|
4d0d9de07e
|
reorganized package and fixed test
|
2020-04-20 10:02:42 +02:00 |