Sandro La Bruzzo
e57294ac99
implemented changes on PUBMed dataflow
2021-06-03 10:52:09 +02:00
Michele Artini
ede2749822
orcid pid type
2021-06-01 12:42:43 +02:00
Michele Artini
f0fbfdcfae
Merge branch 'stable_ids' into import_new_mdstores
2021-06-01 12:03:00 +02:00
Michele Artini
e950750262
add nodes to import hdfs mdstores
2021-06-01 10:48:50 +02:00
Michele Artini
03a510859a
removed coalesce(1)
2021-05-31 14:10:51 +02:00
Michele Artini
e9f2b6037c
patch of mdstore records
2021-05-31 11:36:26 +02:00
Sandro La Bruzzo
02ef46535f
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-05-31 09:50:15 +02:00
Sandro La Bruzzo
aeadc5a366
updated wf Datacite Import to retrieve the block size as parameter
2021-05-31 09:49:53 +02:00
Claudio Atzori
96238152cb
added serialization for alternateIdentifiers and pids within each record instance
2021-05-28 16:57:30 +02:00
Michele Artini
ad56a44fda
save as gzipped sequence file
2021-05-28 14:45:39 +02:00
Claudio Atzori
83722ebc47
pull #111 replied on stable_ids
2021-05-28 14:11:46 +02:00
Claudio Atzori
eb6acfbabc
[cleaning] removing non parsable relation.validationDate(s)
2021-05-28 10:50:44 +02:00
Claudio Atzori
6e3a4e9237
updated test expectations
2021-05-28 09:37:50 +02:00
Claudio Atzori
ac3d090e9e
bumped dhp-schemas dependency version
2021-05-27 17:31:12 +02:00
Michele Artini
4fa5671d16
first implementation of Hdfs Mdstores Importer
2021-05-27 16:22:07 +02:00
Claudio Atzori
c3d92247d3
bumped dhp-schemas dependency version
2021-05-27 15:10:51 +02:00
Claudio Atzori
d512062b58
integrating pull #109 , H2020Classification
2021-05-27 12:22:47 +02:00
Claudio Atzori
5e4b91d9ef
more pervasive use of constants from ModelConstants, especially for ORCID
2021-05-26 18:20:23 +02:00
Sandro La Bruzzo
bced804151
updated wf Datacite Import to retrieve the block size as parameter
2021-05-26 17:06:50 +02:00
Claudio Atzori
4f58418184
depending on dhp-schemas:2.4.7 (release)
2021-05-24 10:32:48 +02:00
Miriam Baglioni
abd88f663d
changed test resource to mirror change in the input file
2021-05-21 15:20:47 +02:00
Miriam Baglioni
c844877de2
changed workflow flow to possibly parallelize also the programme and project preparation steps
2021-05-21 14:41:57 +02:00
Miriam Baglioni
073d76864d
refactoring
2021-05-21 14:41:03 +02:00
Miriam Baglioni
4c8b4a774c
removed not needed code
2021-05-21 14:40:07 +02:00
Enrico Ottonello
abdd0ade1f
added temporary output folder as workflow parameter
2021-05-21 12:08:16 +02:00
Miriam Baglioni
53b9d87fec
new prepareProgramme according to the new file
2021-05-21 11:49:31 +02:00
Miriam Baglioni
1ee8f13580
refactoring and added "left" as join type to be 100% sure to get the whole set of projects
2021-05-21 11:49:05 +02:00
Miriam Baglioni
e07c3ba089
due to change in the input file the filtering step is no more needed
2021-05-21 11:47:43 +02:00
Miriam Baglioni
54f6e2f693
changed to get the needed information to build the action set as parallel jobs
2021-05-21 11:47:00 +02:00
Miriam Baglioni
7180505519
removed non needed variable
2021-05-21 11:46:13 +02:00
Miriam Baglioni
2eb1a8b344
changed because the input file changed
2021-05-21 11:40:20 +02:00
Enrico Ottonello
d0945c3c78
added temporary output folder, because of folder access rights are different on beta and prod
2021-05-20 19:14:31 +02:00
Enrico Ottonello
1265dadc90
workflow aligned with stable_ids
2021-05-20 19:01:28 +02:00
Enrico Ottonello
0821d8e97d
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-05-20 18:33:18 +02:00
Enrico Ottonello
ae7bd24d79
removed old workflows
2021-05-20 18:32:22 +02:00
Enrico Ottonello
4d6c473bf1
removed redundant classes contained now in dhp-schema
2021-05-20 18:26:42 +02:00
Claudio Atzori
9d725efdc1
reverted implementation of the mdstore client
2021-05-20 18:26:09 +02:00
Miriam Baglioni
9610224671
added param to workflow property
2021-05-20 18:21:12 +02:00
Claudio Atzori
863b56b6ce
using constants from ModelConstants
2021-05-20 16:23:58 +02:00
Claudio Atzori
ae5c28e54f
code formatting
2021-05-20 16:13:06 +02:00
Miriam Baglioni
aa45b4df9b
-
2021-05-20 15:57:40 +02:00
Miriam Baglioni
052c837843
-
2021-05-20 15:54:44 +02:00
Claudio Atzori
b695932ae4
integrated pull#108
2021-05-20 15:34:04 +02:00
Claudio Atzori
ea9b00ce56
adjusted test
2021-05-20 15:31:42 +02:00
Claudio Atzori
2e70aa43f0
Merge pull request 'H2020Classification fix and possibility to add datasources in blacklist for propagation of result to organization' ( #108 ) from miriam.baglioni/dnet-hadoop:master into master
...
Reviewed-on: D-Net/dnet-hadoop#108
The changes look ok, but please drop a comment to describe how the parameters should be changed from the workflow caller for both workflows
* H2020Classification
* propagation of result to organization
2021-05-20 15:25:05 +02:00
Claudio Atzori
b572f56763
Merge branch 'master' into master
2021-05-20 15:22:35 +02:00
Claudio Atzori
2578b7fbb3
code formatting
2021-05-20 14:59:02 +02:00
Miriam Baglioni
dc0ad8d2e0
fixed issue related to change in the file name downloaded. Added sheet name as parameter and also a check if the name should change
2021-05-20 14:53:53 +02:00
Claudio Atzori
232dce83db
fixes #6701 : xpath for titles to support both datacite and Guidelines v4 mapping
2021-05-20 14:41:15 +02:00
Claudio Atzori
aef2977ad0
fixes #6701 : xpath for titles to support both datacite and Guidelines v4 mapping
2021-05-20 14:40:22 +02:00