Enrico Ottonello
|
2dc50c0999
|
added default value to process path
|
2021-07-14 17:02:22 +02:00 |
Enrico Ottonello
|
66604bb2b4
|
added absolute path to process folder
|
2021-07-14 16:44:51 +02:00 |
Enrico Ottonello
|
7840cc6526
|
merged with master
|
2021-07-14 15:33:59 +02:00 |
Sandro La Bruzzo
|
10068c00ea
|
Code refactor:
- removed old workflows in doiboost
- splitted workflow of doiboost in preprocess and process
|
2021-07-14 14:45:50 +02:00 |
Sandro La Bruzzo
|
4cb65bc64a
|
fixed process doiboost workflow:
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
|
2021-07-14 09:44:32 +02:00 |
Claudio Atzori
|
734de62474
|
[doiboost] added workflow for the ActionSet update dedicated to production
|
2021-07-13 17:26:04 +02:00 |
Claudio Atzori
|
fa720c1da4
|
[doiboost] added workflow for the ActionSet update dedicated to production
|
2021-07-13 16:59:30 +02:00 |
Miriam Baglioni
|
13c96622c9
|
-
|
2021-06-18 09:45:16 +02:00 |
Miriam Baglioni
|
3585e53da3
|
changed to split in two steps the generation of the crossref dataset
|
2021-06-18 09:41:23 +02:00 |
Miriam Baglioni
|
95885bcf12
|
forces executor Executor memory and driver executor memory to be 7G (trying to avoid OOM)
|
2021-06-16 10:17:52 +02:00 |
Miriam Baglioni
|
2550a73981
|
-
|
2021-06-16 10:04:41 +02:00 |
Miriam Baglioni
|
1c47c0d786
|
modified the number of executors trying to avoid OOM exception
|
2021-06-15 21:05:39 +02:00 |
Miriam Baglioni
|
7deac55138
|
added one option for resume from in the wf
|
2021-06-15 18:38:20 +02:00 |
Miriam Baglioni
|
66e7ef892f
|
changed the parameter name
|
2021-06-15 11:08:54 +02:00 |
Miriam Baglioni
|
4f47ad0891
|
no need to rename the folders, just write in overwrite mode, so I changed the name of the output folder
|
2021-06-15 09:28:31 +02:00 |
Miriam Baglioni
|
6ebc236657
|
added needed property: outputPath
|
2021-06-15 09:23:24 +02:00 |
Miriam Baglioni
|
f7379255b6
|
changed the workflow to extract info from the dump
|
2021-06-15 09:22:54 +02:00 |
Miriam Baglioni
|
8873e6b6d1
|
workflow and parameter
|
2021-06-14 10:15:57 +02:00 |
Miriam Baglioni
|
0f1acdf6b6
|
workflow and parameter
|
2021-06-14 10:08:55 +02:00 |
Enrico Ottonello
|
abdd0ade1f
|
added temporary output folder as workflow parameter
|
2021-05-21 12:08:16 +02:00 |
Enrico Ottonello
|
d0945c3c78
|
added temporary output folder, because of folder access rights are different on beta and prod
|
2021-05-20 19:14:31 +02:00 |
Enrico Ottonello
|
1265dadc90
|
workflow aligned with stable_ids
|
2021-05-20 19:01:28 +02:00 |
Enrico Ottonello
|
ae7bd24d79
|
removed old workflows
|
2021-05-20 18:32:22 +02:00 |
Enrico Ottonello
|
e13926cdd0
|
merged with master
|
2021-05-14 18:10:31 +02:00 |
Sandro La Bruzzo
|
d9a0bbda7b
|
implemented new phase in doiboost to make the dataset Distinct by ID
|
2021-05-13 12:25:14 +02:00 |
Sandro La Bruzzo
|
54217d73ff
|
removed old parameters from oozie workflow
|
2021-05-11 09:59:02 +02:00 |
Sandro La Bruzzo
|
7dc824fc23
|
imported changes in stable_id into master
|
2021-05-07 12:53:50 +02:00 |
Sandro La Bruzzo
|
1adfc41d23
|
merged manually changes on stable_id for doiboost into master
|
2021-05-05 10:23:32 +02:00 |
Enrico Ottonello
|
c537986b7c
|
deleted folders with merged data immediately before merge phases
|
2021-04-28 11:25:25 +02:00 |
Claudio Atzori
|
e5abbec2ba
|
[orcid] download of the lambda file defined in a script
|
2021-04-22 11:22:10 +02:00 |
Claudio Atzori
|
55964cbd81
|
[orcid] large oozie workflow cleanup; updated workflow for the orcidnodoi actionset creation
|
2021-04-22 10:18:09 +02:00 |
Claudio Atzori
|
52244f813a
|
merging from enrico.ottonello/dnet-hadoop:orcid-no-doi
|
2021-04-21 12:24:09 +02:00 |
Enrico Ottonello
|
27068aacd1
|
wf to move orcid-no-doi dataset on the folder ready the import
|
2021-04-16 17:17:47 +02:00 |
Sandro La Bruzzo
|
67085da305
|
fixed NPE
|
2021-04-16 11:05:58 +02:00 |
Sandro La Bruzzo
|
479abd10cb
|
Add into ORCID workflow a method that extracts orcid directly to the dump generated by Enrico
|
2021-04-13 17:47:43 +02:00 |
Claudio Atzori
|
ee34cc51c3
|
[ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98
|
2021-04-01 17:07:49 +02:00 |
Enrico Ottonello
|
ebd67b8c8f
|
removed duplicates orcid data on authors set
|
2021-03-25 11:20:52 +01:00 |
Sandro La Bruzzo
|
cc5bbafa5d
|
some fix to make workflows runs
|
2021-03-17 12:12:56 +01:00 |
Enrico Ottonello
|
70cb100647
|
added updating last orcid dataset folders after completion
|
2021-03-01 10:17:04 +01:00 |
Enrico Ottonello
|
bd3b16402b
|
added result typologies
|
2021-03-01 10:16:02 +01:00 |
Enrico Ottonello
|
53d7023460
|
dateOfCollection taken from orcid last_update.txt on hdfs; cleaned wf parameters
|
2021-02-25 18:43:29 +01:00 |
Enrico Ottonello
|
d43ea88caf
|
aligned orcid result typologies with openaire vocabulary
|
2021-02-25 15:02:10 +01:00 |
Enrico Ottonello
|
975823b968
|
data from last updated orcid
|
2021-02-23 15:35:04 +01:00 |
Enrico Ottonello
|
c238561001
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2021-02-04 10:44:21 +01:00 |
Enrico Ottonello
|
465ce39f75
|
job execution now based on file last_update.txt on hdfs
|
2021-02-04 10:44:04 +01:00 |
Claudio Atzori
|
ab2fe9266a
|
[DOIBoost] minor fixes in workflow definition
|
2021-01-05 10:26:39 +01:00 |
Claudio Atzori
|
7c722f3fdc
|
[DOIBoost] fixed typo
|
2021-01-05 10:25:54 +01:00 |
Claudio Atzori
|
8879704ba0
|
[DOIBoost] configurable ES server url and index name in crossref importer
|
2021-01-05 10:00:13 +01:00 |
Sandro La Bruzzo
|
e79445a8b4
|
minor fix for claudio polemica
|
2021-01-04 17:39:25 +01:00 |
Sandro La Bruzzo
|
8765020b85
|
minor fix
|
2021-01-04 17:37:08 +01:00 |