Sandro La Bruzzo
|
7d29b61c62
|
code refactor
|
2020-05-28 09:57:46 +02:00 |
Sandro La Bruzzo
|
25f52e19a4
|
implemented generation of ActionSet
|
2020-05-26 09:15:33 +02:00 |
Sandro La Bruzzo
|
2408083566
|
implemented filtering step
|
2020-05-23 08:46:49 +02:00 |
Sandro La Bruzzo
|
147dd389bf
|
minor fix
|
2020-05-22 20:51:42 +02:00 |
Sandro La Bruzzo
|
22936d0877
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-05-22 15:15:17 +02:00 |
Sandro La Bruzzo
|
9fbb221457
|
completed mapping of UnpayWall and ORCID
|
2020-05-22 15:15:09 +02:00 |
Enrico Ottonello
|
1109d3b3fc
|
Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost
|
2020-05-21 00:41:27 +02:00 |
Enrico Ottonello
|
869a53040e
|
save to text file format
|
2020-05-21 00:41:21 +02:00 |
Sandro La Bruzzo
|
5818abaab4
|
fixed Crossref Mapping
|
2020-05-20 17:05:46 +02:00 |
Sandro La Bruzzo
|
b771d67e9d
|
next step of MAG conversion implemented
|
2020-05-20 08:14:03 +02:00 |
Enrico Ottonello
|
934ad570e0
|
joined summaries and activities dataset
|
2020-05-19 12:57:21 +02:00 |
Enrico Ottonello
|
ca722d4d18
|
merged
|
2020-05-19 09:43:12 +02:00 |
Enrico Ottonello
|
7362bc3e9d
|
workflow to generate seq(doi,AuthorList)
|
2020-05-19 09:34:44 +02:00 |
Sandro La Bruzzo
|
486e850bcc
|
next step of MAG conversion implemented
|
2020-05-19 09:24:45 +02:00 |
Enrico Ottonello
|
d4e9075f22
|
Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost
|
2020-05-18 19:51:36 +02:00 |
Enrico Ottonello
|
fc80e8c7de
|
added accumulator; last modified date of the record is added to saved data; lambda file is partitioned into 20 parts before starting downloading
|
2020-05-18 19:51:29 +02:00 |
Enrico Ottonello
|
0b29bb7e3b
|
spark job to download orcid record modified after a fixed date
|
2020-05-15 19:49:26 +02:00 |
Enrico Ottonello
|
12756f9d41
|
multithread (4 threads) test to feed elastic search
|
2020-05-13 16:11:40 +02:00 |
Sandro La Bruzzo
|
d876f47d06
|
next step of MAG conversion implemented
|
2020-05-13 10:38:04 +02:00 |
Enrico Ottonello
|
08040cef80
|
spark action to analyze orcid lambda file
|
2020-05-12 16:57:43 +02:00 |
Enrico Ottonello
|
3b1a68cbf5
|
elastic search feed test
|
2020-05-11 14:53:52 +02:00 |
Enrico Ottonello
|
f53e42bda7
|
merged
|
2020-05-11 14:49:28 +02:00 |
Enrico Ottonello
|
7990894454
|
different date format in lambda file parsing
|
2020-05-11 14:41:11 +02:00 |
Sandro La Bruzzo
|
0c6774e4da
|
updated pom version
|
2020-05-11 14:35:14 +02:00 |
Sandro La Bruzzo
|
4062eafbdb
|
merged from branch
|
2020-05-11 14:08:16 +02:00 |
Sandro La Bruzzo
|
1662f221f5
|
added test class
|
2020-05-11 09:39:11 +02:00 |
Sandro La Bruzzo
|
2b48a2c32c
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-05-11 09:38:36 +02:00 |
Sandro La Bruzzo
|
4cebca09d2
|
start implementing MAG mapping
|
2020-05-11 09:38:27 +02:00 |
Enrico Ottonello
|
b9d126dd1f
|
formatting modified after commit
|
2020-05-08 14:54:37 +02:00 |
Enrico Ottonello
|
7e1c987370
|
Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost
|
2020-05-08 14:49:50 +02:00 |
Enrico Ottonello
|
9d812788e4
|
added job to download from orcid the records modified after a fixed date, the info are taken from last_modified.csv on hdfs
|
2020-05-08 14:49:39 +02:00 |
Sandro La Bruzzo
|
1e06bbaee8
|
fixed test
|
2020-04-30 11:38:58 +02:00 |
Sandro La Bruzzo
|
4a89465740
|
reformatted code
|
2020-04-29 13:24:29 +02:00 |
Sandro La Bruzzo
|
a6b1a59d0a
|
merged with maaster
|
2020-04-29 13:20:57 +02:00 |
Sandro La Bruzzo
|
920c0f19c3
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-04-29 13:13:16 +02:00 |
Sandro La Bruzzo
|
09f161f1f4
|
implemented unit test
|
2020-04-29 13:13:02 +02:00 |
Enrico Ottonello
|
1edcd53581
|
added shell actions to download all 11 activities files from ORCID
|
2020-04-28 20:25:09 +02:00 |
Enrico Ottonello
|
a1861b9eaa
|
workflow works in parallel on 2 activity files
|
2020-04-24 18:33:37 +02:00 |
Enrico Ottonello
|
941e94af06
|
added workflow for generating authors with dois data sequence file
|
2020-04-24 15:50:40 +02:00 |
Sandro La Bruzzo
|
4ba386d996
|
improved crossref mapping
|
2020-04-23 09:33:48 +02:00 |
Sandro La Bruzzo
|
157915988c
|
improved crossref mapping
|
2020-04-22 15:00:44 +02:00 |
Enrico Ottonello
|
5977f08e92
|
merged
|
2020-04-22 14:50:50 +02:00 |
Enrico Ottonello
|
7d759947ae
|
used vtd for parsing orcid xml record, set 4g heapspace
|
2020-04-22 14:41:19 +02:00 |
Sandro La Bruzzo
|
e4b105cece
|
improved crossref mapping
|
2020-04-20 18:10:07 +02:00 |
Sandro La Bruzzo
|
5d46ec7d5f
|
fixed name of wrong package
|
2020-04-20 14:49:32 +02:00 |
Sandro La Bruzzo
|
82cc3b707d
|
fixed name of wrong package
|
2020-04-20 14:47:06 +02:00 |
Sandro La Bruzzo
|
7029942e06
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-04-20 13:26:41 +02:00 |
Sandro La Bruzzo
|
0e45f4d450
|
continue mapping from crossref to OAF
|
2020-04-20 13:26:29 +02:00 |
Enrico Ottonello
|
a466648b4b
|
renamed output file
|
2020-04-20 12:32:03 +02:00 |
Enrico Ottonello
|
4ae55e3891
|
added workflow parameters
|
2020-04-20 12:00:04 +02:00 |
Sandro La Bruzzo
|
eef60bb9f4
|
created structure of oozie wf for ORCID
|
2020-04-20 10:24:57 +02:00 |
Sandro La Bruzzo
|
4d0d9de07e
|
reorganized package and fixed test
|
2020-04-20 10:02:42 +02:00 |
Sandro La Bruzzo
|
618bc1fc72
|
first implementation of crossrefMapping
|
2020-04-20 09:53:34 +02:00 |
Enrico Ottonello
|
1d44a359ea
|
renamed package folder
|
2020-04-20 09:25:40 +02:00 |
Enrico Ottonello
|
7011d4203e
|
parser of orcid summaries from tar gz file on hdfs, that creates a sequence file with authors informations (oid, name, surname, credit name)
|
2020-04-17 18:52:39 +02:00 |
Sandro La Bruzzo
|
a329ea5575
|
merged with master branch
|
2020-04-17 12:23:54 +02:00 |
Sandro La Bruzzo
|
205e9521c6
|
implemented import crossref job
|
2020-04-01 14:12:33 +02:00 |