BrBETA_dnet-hadoop

Commit Graph

Author	SHA1	Message	Date
Miriam Baglioni	6fec71e8d2	removed the specific of the infra we are running the wf from the wf name	2021-08-13 16:39:02 +02:00
Miriam Baglioni	da20fceaf7	removed all the part related to the crossref dump download since it is done in a separate workflow	2021-08-09 11:53:45 +02:00
Miriam Baglioni	1965e4eece	new workflow for downloading the dump of crossref and unpack it	2021-08-04 18:29:03 +02:00
Miriam Baglioni	b4eb026c8b	mergin with branch beta	2021-08-04 10:21:37 +02:00
Miriam Baglioni	b226ba4439	mergin with branch beta	2021-07-21 09:46:40 +02:00
Miriam Baglioni	83fe31c92e	changed the name of the workflows	2021-07-19 18:19:14 +02:00
Miriam Baglioni	54acc5373b	changed the name of the workflows	2021-07-19 18:18:09 +02:00
Miriam Baglioni	b420b11ed3	duplicate the number of partitions in ProcessMag	2021-07-19 18:16:23 +02:00
Miriam Baglioni	662c396354	duplicate the number of partitions in ConvertCrossrefToOaf	2021-07-19 12:41:14 +02:00
Miriam Baglioni	c4b18e6ccb	changed the download.sh, added skip step to allow to not execute one phase and changed the workflow sequence of steps	2021-07-16 15:01:25 +02:00
Miriam Baglioni	acd6056330	added shell action to automatically download the new dump and put it in a specified hdfs location	2021-07-16 12:47:10 +02:00
Claudio Atzori	bf9e0d2d4f	Merge pull request 'orcid-no-doi' (#123 ) from enrico.ottonello/dnet-hadoop:orcid-no-doi into beta Reviewed-on: D-Net/dnet-hadoop#123	2021-07-15 17:59:41 +02:00
Sandro La Bruzzo	3d8e2aa146	Code refactor: - removed old workflows in doiboost - splitted workflow of doiboost in preprocess and process	2021-07-14 14:37:06 +02:00
Sandro La Bruzzo	c35c117601	fixed process doiboost workflow: - splitted OrcidToOAF into two phase preprocess and process - updated workflow used in production	2021-07-14 12:48:01 +02:00
Miriam Baglioni	13c96622c9	-	2021-06-18 09:45:16 +02:00
Miriam Baglioni	3585e53da3	changed to split in two steps the generation of the crossref dataset	2021-06-18 09:41:23 +02:00
Miriam Baglioni	95885bcf12	forces executor Executor memory and driver executor memory to be 7G (trying to avoid OOM)	2021-06-16 10:17:52 +02:00
Miriam Baglioni	2550a73981	-	2021-06-16 10:04:41 +02:00
Miriam Baglioni	1c47c0d786	modified the number of executors trying to avoid OOM exception	2021-06-15 21:05:39 +02:00
Miriam Baglioni	7deac55138	added one option for resume from in the wf	2021-06-15 18:38:20 +02:00
Miriam Baglioni	66e7ef892f	changed the parameter name	2021-06-15 11:08:54 +02:00
Miriam Baglioni	4f47ad0891	no need to rename the folders, just write in overwrite mode, so I changed the name of the output folder	2021-06-15 09:28:31 +02:00
Miriam Baglioni	6ebc236657	added needed property: outputPath	2021-06-15 09:23:24 +02:00
Miriam Baglioni	f7379255b6	changed the workflow to extract info from the dump	2021-06-15 09:22:54 +02:00
Miriam Baglioni	8873e6b6d1	workflow and parameter	2021-06-14 10:15:57 +02:00
Miriam Baglioni	0f1acdf6b6	workflow and parameter	2021-06-14 10:08:55 +02:00
Enrico Ottonello	c537986b7c	deleted folders with merged data immediately before merge phases	2021-04-28 11:25:25 +02:00
Claudio Atzori	e5abbec2ba	[orcid] download of the lambda file defined in a script	2021-04-22 11:22:10 +02:00
Claudio Atzori	55964cbd81	[orcid] large oozie workflow cleanup; updated workflow for the orcidnodoi actionset creation	2021-04-22 10:18:09 +02:00
Claudio Atzori	52244f813a	merging from enrico.ottonello/dnet-hadoop:orcid-no-doi	2021-04-21 12:24:09 +02:00
Enrico Ottonello	27068aacd1	wf to move orcid-no-doi dataset on the folder ready the import	2021-04-16 17:17:47 +02:00
Sandro La Bruzzo	67085da305	fixed NPE	2021-04-16 11:05:58 +02:00
Sandro La Bruzzo	479abd10cb	Add into ORCID workflow a method that extracts orcid directly to the dump generated by Enrico	2021-04-13 17:47:43 +02:00
Claudio Atzori	ee34cc51c3	[ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98	2021-04-01 17:07:49 +02:00
Enrico Ottonello	ebd67b8c8f	removed duplicates orcid data on authors set	2021-03-25 11:20:52 +01:00
Sandro La Bruzzo	cc5bbafa5d	some fix to make workflows runs	2021-03-17 12:12:56 +01:00
Enrico Ottonello	70cb100647	added updating last orcid dataset folders after completion	2021-03-01 10:17:04 +01:00
Enrico Ottonello	bd3b16402b	added result typologies	2021-03-01 10:16:02 +01:00
Enrico Ottonello	53d7023460	dateOfCollection taken from orcid last_update.txt on hdfs; cleaned wf parameters	2021-02-25 18:43:29 +01:00
Enrico Ottonello	d43ea88caf	aligned orcid result typologies with openaire vocabulary	2021-02-25 15:02:10 +01:00
Enrico Ottonello	975823b968	data from last updated orcid	2021-02-23 15:35:04 +01:00
Enrico Ottonello	c238561001	Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi	2021-02-04 10:44:21 +01:00
Enrico Ottonello	465ce39f75	job execution now based on file last_update.txt on hdfs	2021-02-04 10:44:04 +01:00
Claudio Atzori	ab2fe9266a	[DOIBoost] minor fixes in workflow definition	2021-01-05 10:26:39 +01:00
Claudio Atzori	7c722f3fdc	[DOIBoost] fixed typo	2021-01-05 10:25:54 +01:00
Claudio Atzori	8879704ba0	[DOIBoost] configurable ES server url and index name in crossref importer	2021-01-05 10:00:13 +01:00
Sandro La Bruzzo	e79445a8b4	minor fix for claudio polemica	2021-01-04 17:39:25 +01:00
Sandro La Bruzzo	8765020b85	minor fix	2021-01-04 17:37:08 +01:00
Sandro La Bruzzo	b0dc92786f	defined a single oozie workflow for the generation of doiboost	2021-01-04 17:01:35 +01:00
Enrico Ottonello	b2de598c1a	all actions from download lambda file to merge updated data into one wf	2020-12-15 10:42:55 +01:00

1 2 3

114 Commits