Claudio Atzori
|
c4a23c2f4d
|
fix: preserving the old identifier among the originalIds in the doiboost construction process, trying to avoid UnsupportedOperationException while adding elements to the originalIds
|
2021-05-19 16:01:52 +02:00 |
Claudio Atzori
|
ba03f549d7
|
fix: preserving the old identifier among the originalIds in the doiboost construction process
|
2021-05-19 15:43:26 +02:00 |
Claudio Atzori
|
2cbf15f4fb
|
using ModelConstants
|
2021-05-17 09:54:45 +02:00 |
Claudio Atzori
|
f19feceaf0
|
set the old identifier before switching to the new one
|
2021-05-14 12:53:40 +02:00 |
Claudio Atzori
|
1bd70fa2c6
|
preserving the old identifier among the originalIds in the doiboost construction process
|
2021-05-14 11:30:41 +02:00 |
Claudio Atzori
|
ca3f3a7687
|
using ModelConstants
|
2021-05-14 11:29:49 +02:00 |
Claudio Atzori
|
23b8883ab1
|
applied intellij code cleanup
|
2021-05-14 10:58:12 +02:00 |
Enrico Ottonello
|
c537986b7c
|
deleted folders with merged data immediately before merge phases
|
2021-04-28 11:25:25 +02:00 |
Claudio Atzori
|
5afa7d3e0c
|
core utilities in dhp-common moved in external module dhp-schemas
|
2021-04-27 15:44:01 +02:00 |
Claudio Atzori
|
27ab8a704d
|
adjusted poms to align with the external dhp-schema module
|
2021-04-27 10:12:27 +02:00 |
Claudio Atzori
|
c2bb03c8b5
|
depending on external dhp-schemas module
|
2021-04-23 17:57:35 +02:00 |
Claudio Atzori
|
e5abbec2ba
|
[orcid] download of the lambda file defined in a script
|
2021-04-22 11:22:10 +02:00 |
Claudio Atzori
|
55964cbd81
|
[orcid] large oozie workflow cleanup; updated workflow for the orcidnodoi actionset creation
|
2021-04-22 10:18:09 +02:00 |
Claudio Atzori
|
52244f813a
|
merging from enrico.ottonello/dnet-hadoop:orcid-no-doi
|
2021-04-21 12:24:09 +02:00 |
Sandro La Bruzzo
|
a16e5299f9
|
applied unique function on the final dataset
|
2021-04-16 17:36:48 +02:00 |
Enrico Ottonello
|
27068aacd1
|
wf to move orcid-no-doi dataset on the folder ready the import
|
2021-04-16 17:17:47 +02:00 |
Sandro La Bruzzo
|
67085da305
|
fixed NPE
|
2021-04-16 11:05:58 +02:00 |
Sandro La Bruzzo
|
7d6a80e2f2
|
added new type on MAG mapping
|
2021-04-16 09:14:15 +02:00 |
Sandro La Bruzzo
|
3f77bfceb0
|
fixed test failure on jenkins
|
2021-04-14 10:03:01 +02:00 |
Sandro La Bruzzo
|
479abd10cb
|
Add into ORCID workflow a method that extracts orcid directly to the dump generated by Enrico
|
2021-04-13 17:47:43 +02:00 |
Claudio Atzori
|
e686b8de8d
|
[ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98
|
2021-04-01 17:11:03 +02:00 |
Claudio Atzori
|
ee34cc51c3
|
[ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98
|
2021-04-01 17:07:49 +02:00 |
Claudio Atzori
|
7941d7be29
|
WIP: using common definitions from ModelConstants
|
2021-03-31 18:33:57 +02:00 |
Enrico Ottonello
|
59ec5137e1
|
improvement related to https://issue.openaire.research-infrastructures.eu/issues/6501
|
2021-03-31 16:25:41 +02:00 |
Sandro La Bruzzo
|
616d2ecce2
|
splitted workflow collecting datacite into two workflows.
Released on beta
|
2021-03-31 15:45:58 +02:00 |
Sandro La Bruzzo
|
1dfda3624e
|
improved workflow importing datacite
|
2021-03-26 13:56:29 +01:00 |
Enrico Ottonello
|
ebd67b8c8f
|
removed duplicates orcid data on authors set
|
2021-03-25 11:20:52 +01:00 |
Sandro La Bruzzo
|
625e4c29c4
|
added model constants
|
2021-03-23 09:39:56 +01:00 |
Sandro La Bruzzo
|
c392936b97
|
fixed error on best access right
|
2021-03-23 09:23:22 +01:00 |
Sandro La Bruzzo
|
c73072079d
|
fix conflicts
|
2021-03-22 16:36:31 +01:00 |
Sandro La Bruzzo
|
098914dcff
|
fix wrong relation with source null
|
2021-03-22 11:35:02 +01:00 |
Sandro La Bruzzo
|
25d5663d97
|
added filter
|
2021-03-18 10:24:42 +01:00 |
Sandro La Bruzzo
|
5f98ea74a9
|
Added fix for pid generation in stableIds
|
2021-03-17 15:53:24 +01:00 |
Sandro La Bruzzo
|
cc5bbafa5d
|
some fix to make workflows runs
|
2021-03-17 12:12:56 +01:00 |
Sandro La Bruzzo
|
4bb3bcafa5
|
add author sequence number
|
2021-03-11 11:32:32 +01:00 |
Sandro La Bruzzo
|
a8e5d0ea0d
|
updated test and fixed assign of access right
|
2021-03-11 10:41:24 +01:00 |
Sandro La Bruzzo
|
f5e7c57654
|
Fixed ticket 6282
|
2021-03-11 10:32:45 +01:00 |
Claudio Atzori
|
d525785497
|
[#6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color.
|
2021-03-09 11:12:55 +01:00 |
Sandro La Bruzzo
|
a2169ccf07
|
// implemented Ticket #6281 added pid to Instance in doiBoost
|
2021-03-09 10:46:36 +01:00 |
Claudio Atzori
|
8d2bb24512
|
merged from master
|
2021-03-08 15:44:34 +01:00 |
Enrico Ottonello
|
70cb100647
|
added updating last orcid dataset folders after completion
|
2021-03-01 10:17:04 +01:00 |
Enrico Ottonello
|
bd3b16402b
|
added result typologies
|
2021-03-01 10:16:02 +01:00 |
Enrico Ottonello
|
53d7023460
|
dateOfCollection taken from orcid last_update.txt on hdfs; cleaned wf parameters
|
2021-02-25 18:43:29 +01:00 |
Enrico Ottonello
|
d43ea88caf
|
aligned orcid result typologies with openaire vocabulary
|
2021-02-25 15:02:10 +01:00 |
Enrico Ottonello
|
975823b968
|
data from last updated orcid
|
2021-02-23 15:35:04 +01:00 |
Enrico Ottonello
|
ee4ba7298b
|
fix last update read/write from file on hdfs
|
2021-02-09 23:24:57 +01:00 |
Claudio Atzori
|
72c57b28fa
|
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
|
2021-02-04 14:08:18 +01:00 |
Enrico Ottonello
|
c238561001
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2021-02-04 10:44:21 +01:00 |
Enrico Ottonello
|
465ce39f75
|
job execution now based on file last_update.txt on hdfs
|
2021-02-04 10:44:04 +01:00 |
Sandro La Bruzzo
|
99cf3a8ea4
|
Merged Datacite transfrom into this branch
|
2021-01-28 16:34:46 +01:00 |
Claudio Atzori
|
ab2fe9266a
|
[DOIBoost] minor fixes in workflow definition
|
2021-01-05 10:26:39 +01:00 |
Claudio Atzori
|
7c722f3fdc
|
[DOIBoost] fixed typo
|
2021-01-05 10:25:54 +01:00 |
Claudio Atzori
|
8879704ba0
|
[DOIBoost] configurable ES server url and index name in crossref importer
|
2021-01-05 10:00:13 +01:00 |
Sandro La Bruzzo
|
7834a35768
|
avoid to save intermediate dataset before generation of Sequence file
|
2021-01-04 17:54:57 +01:00 |
Sandro La Bruzzo
|
e79445a8b4
|
minor fix for claudio polemica
|
2021-01-04 17:39:25 +01:00 |
Sandro La Bruzzo
|
8765020b85
|
minor fix
|
2021-01-04 17:37:08 +01:00 |
Sandro La Bruzzo
|
b0dc92786f
|
defined a single oozie workflow for the generation of doiboost
|
2021-01-04 17:01:35 +01:00 |
Claudio Atzori
|
28460c2cd1
|
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
|
2020-12-23 16:59:52 +01:00 |
Sandro La Bruzzo
|
1f6c8a9e83
|
added orcid_pending type to records coming from Crossref
|
2020-12-15 11:47:15 +01:00 |
Enrico Ottonello
|
b2de598c1a
|
all actions from download lambda file to merge updated data into one wf
|
2020-12-15 10:42:55 +01:00 |
Enrico Ottonello
|
efe4c2a9c5
|
authors and works are now updated in two separate spark actions of the wf
|
2020-12-12 02:06:21 +01:00 |
Enrico Ottonello
|
858efbfad1
|
fix dataset creation for downloaded works
|
2020-12-11 16:49:54 +01:00 |
Claudio Atzori
|
d9532446eb
|
imported more diffs from master branch; code formatting
|
2020-12-10 16:14:16 +01:00 |
Claudio Atzori
|
12e2f930c8
|
resolved conflicts
|
2020-12-10 10:57:39 +01:00 |
Enrico Ottonello
|
2233750a37
|
original orcid xml data are stored in a field of the class that models orcid data
|
2020-12-09 09:45:19 +01:00 |
Sandro La Bruzzo
|
302baab67b
|
fixed doiboost mapping and workflows
|
2020-12-07 19:59:33 +01:00 |
Enrico Ottonello
|
5c65e602d3
|
wf doi_authors generates one json data foreach row
|
2020-12-07 15:28:10 +01:00 |
Enrico Ottonello
|
fa1855a4b8
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-12-07 11:02:59 +01:00 |
Enrico Ottonello
|
b1b589ada1
|
wf to generate orcid dataset
|
2020-12-07 11:02:32 +01:00 |
Sandro La Bruzzo
|
b31dd126fb
|
fixed crossref workflow added common ORCID Class
|
2020-12-07 10:42:38 +01:00 |
Enrico Ottonello
|
8812ab65e1
|
completed download function to wf; added accumulators
|
2020-12-04 21:13:49 +01:00 |
Enrico Ottonello
|
53b22c1937
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-12-02 23:21:27 +01:00 |
Enrico Ottonello
|
1b1e9ea67c
|
wf to generate doi_author_list for doiboost; wf to download updated works
|
2020-12-02 23:20:16 +01:00 |
Sandro La Bruzzo
|
7da679542f
|
fixed wrong projectId
|
2020-12-02 14:28:09 +01:00 |
Sandro La Bruzzo
|
6ba8037cc7
|
fixed failure to test due to changing of input
|
2020-12-02 11:34:46 +01:00 |
Claudio Atzori
|
cfb55effd9
|
code formatting
|
2020-12-02 11:23:49 +01:00 |
Enrico Ottonello
|
f2df3ead74
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-30 14:22:46 +01:00 |
Enrico Ottonello
|
40c4559e92
|
added datainfo on authors pid with "sysimport:crosswalk:entityregistry",
|
2020-11-30 14:19:22 +01:00 |
Claudio Atzori
|
a104d2b6ad
|
cleanup
|
2020-11-26 11:12:00 +01:00 |
Claudio Atzori
|
db0181b8af
|
Merge pull request 'added bidirectionality to relations from project and result coming from crossref' (#60) from miriam.baglioni/dnet-hadoop:sxBidirectionality into master
|
2020-11-25 17:17:40 +01:00 |
Sandro La Bruzzo
|
ec3e238de6
|
Fixed problem on duplicated identifier
|
2020-11-25 17:15:54 +01:00 |
Sandro La Bruzzo
|
264723ffd8
|
updated stuff for zenodo upload
|
2020-11-25 11:56:07 +01:00 |
Enrico Ottonello
|
99a086f0c6
|
max concurrent executors set to 10, according to ORCID Director of Technology mail request
|
2020-11-24 17:49:32 +01:00 |
Miriam Baglioni
|
00874a8ce6
|
added bidirectionality to relations from project and result
|
2020-11-24 15:17:23 +01:00 |
Enrico Ottonello
|
5c17e768b2
|
set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions
|
2020-11-23 16:01:23 +01:00 |
Enrico Ottonello
|
97c8111847
|
action to convert lambda file in seq file; spark action to download updated authors
|
2020-11-23 09:49:22 +01:00 |
Enrico Ottonello
|
c0c2e05eae
|
added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs
|
2020-11-17 18:23:12 +01:00 |
Enrico Ottonello
|
005f849674
|
added compression to output dataset
|
2020-11-13 12:45:31 +01:00 |
Enrico Ottonello
|
9a2fa9dc2f
|
added test for other names parsing from summaries dump
|
2020-11-13 10:25:34 +01:00 |
Enrico Ottonello
|
13f28fa225
|
moved AuthorData to dhp-schemas; added other names to author data
|
2020-11-12 17:43:32 +01:00 |
Claudio Atzori
|
9b0fb9e958
|
merged from master
|
2020-11-12 09:27:12 +01:00 |
Enrico Ottonello
|
1f861f2b0d
|
now wf output is a sequence file with the format seq("eu.dnetlib.dhp.schema.oaf.Publication",eu.dnetlib.dhp.schema.action.AtomicActions)
|
2020-11-11 17:38:50 +01:00 |
Enrico Ottonello
|
fea2451658
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-10 11:49:43 +01:00 |
Enrico Ottonello
|
1513174d7e
|
added further test case
|
2020-11-10 11:44:55 +01:00 |
Sandro La Bruzzo
|
8e1d43aab2
|
Implemented ID generation using IdentifierRecordFactory on DOIBoost
|
2020-11-09 11:53:55 +01:00 |
Sandro La Bruzzo
|
cd27df91a1
|
fixed bug on missing relation in ANDS
|
2020-11-06 17:12:31 +01:00 |
Enrico Ottonello
|
6bc7dbeca7
|
first version of dataset successful generated from orcid dump 2020
|
2020-11-06 13:47:50 +01:00 |
Sandro La Bruzzo
|
39337d8a8a
|
fixed test
|
2020-11-02 09:26:25 +01:00 |
Enrico Ottonello
|
9818e74a70
|
added dependency version in main pom.xml for orcid no doi
|
2020-10-22 16:38:00 +02:00 |
Enrico Ottonello
|
210a50e4f4
|
replaced null value
|
2020-10-22 16:24:42 +02:00 |