BrBETA_dnet-hadoop

Commit Graph

Author	SHA1	Message	Date
Miriam Baglioni	86f47afcc7	slight modification of the resource to accomodate also doi normalization tests	2021-06-30 14:36:49 +02:00
Miriam Baglioni	03767ea8e6	slight modification of the resource to accomodate also doi normalization tests	2021-06-30 13:21:24 +02:00
Miriam Baglioni	f8eec0ca9a	added resource to test the normalization of doi during the import of MAG	2021-06-30 13:19:54 +02:00
Miriam Baglioni	149f85ddf5	added tests for the normalization of the dois	2021-06-30 13:00:52 +02:00
Miriam Baglioni	e487b5544c	added tests for the normalization of the dois	2021-06-30 12:57:11 +02:00
Miriam Baglioni	1503ccbbb5	added tests for the normalization of the dois	2021-06-30 12:55:37 +02:00
Miriam Baglioni	1299bfb357	Added class to test the normalization of doi	2021-06-30 12:53:27 +02:00
Miriam Baglioni	cf758f4f91	added normalization step for the doi	2021-06-30 10:03:15 +02:00
Miriam Baglioni	801763a0fa	there is no more the need to lower case the doi since it is done in the first step. Also changed the creation of the id by using the factory	2021-06-29 19:07:23 +02:00
Miriam Baglioni	a74de1cda2	added normalization step to the doi	2021-06-29 18:51:11 +02:00
Miriam Baglioni	06074ea7d3	added normalization step to the doi	2021-06-29 18:46:08 +02:00
Miriam Baglioni	8b8ffe82dc	added step of normalization for the doi	2021-06-29 18:41:39 +02:00
Miriam Baglioni	50cc21d92e	Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.)	2021-06-29 18:35:28 +02:00
Claudio Atzori	6d3f960238	Merge pull request 'added the missing indicators files' (#120 ) from antonis.lempesis/dnet-hadoop:stable_ids into stable_ids Reviewed-on: D-Net/dnet-hadoop#120	2021-06-29 15:57:39 +02:00
Antonis Lempesis	ae18171212	Merge branch 'stable_ids' into stable_ids	2021-06-29 15:33:39 +02:00
Antonis Lempesis	87f14a3899	added the missing indicators files	2021-06-29 16:31:51 +03:00
Claudio Atzori	986a8011ec	Merge pull request 'copied latest changes from old fork: indicators+monitor institutions' (#119 ) from antonis.lempesis/dnet-hadoop:stable_ids into stable_ids Reviewed-on: D-Net/dnet-hadoop#119	2021-06-29 08:49:12 +02:00
Antonis Lempesis	018c4eb52c	copied latest changes from old fork: indicators+monitor institutions	2021-06-28 23:46:52 +03:00
Claudio Atzori	af42377d0e	HttpClient used in metadata collection retries on 502, 503, 504	2021-06-28 09:34:30 +02:00
Claudio Atzori	67afd06cd1	[cleaning] cleaning instance.pid and instance.alternateidentifier using the same procedure used to clean result.pid	2021-06-24 12:10:17 +02:00
Claudio Atzori	2e8fd2c531	cleanup	2021-06-23 14:38:24 +02:00
Claudio Atzori	4dc9ebf217	[raw_all] fixed unit test	2021-06-23 14:38:07 +02:00
Claudio Atzori	50fc5a64a0	[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity	2021-06-23 11:49:42 +02:00
Claudio Atzori	5edcc6832a	applying sonarLint suggestions	2021-06-23 09:53:29 +02:00
Claudio Atzori	2dd5449c13	Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids	2021-06-18 10:08:15 +02:00
Claudio Atzori	fd54ecf7bd	bumped dhp-schemas dependency version	2021-06-18 10:08:07 +02:00
Miriam Baglioni	180d671127	Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids	2021-06-18 09:46:18 +02:00
Miriam Baglioni	13c96622c9	-	2021-06-18 09:45:16 +02:00
Miriam Baglioni	b486ae498f	added test and test resource to verify the generation of the date of acceptance from the input extracted from the dump	2021-06-18 09:43:32 +02:00
Miriam Baglioni	464c2ddde3	changed to split in two steps the generation of the crossref dataset	2021-06-18 09:42:31 +02:00
Miriam Baglioni	6aca0d8ebb	added kryo encoding for input files	2021-06-18 09:42:07 +02:00
Miriam Baglioni	3585e53da3	changed to split in two steps the generation of the crossref dataset	2021-06-18 09:41:23 +02:00
Claudio Atzori	41b551562e	applying PR#115 (DatePicker) on stable_ids	2021-06-17 09:33:50 +02:00
Claudio Atzori	74833d04f1	Merge branch 'pids_beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into stable_ids	2021-06-16 15:54:18 +02:00
Claudio Atzori	7243a40c88	code formatting	2021-06-16 15:03:03 +02:00
Miriam Baglioni	95885bcf12	forces executor Executor memory and driver executor memory to be 7G (trying to avoid OOM)	2021-06-16 10:17:52 +02:00
Miriam Baglioni	2550a73981	-	2021-06-16 10:04:41 +02:00
Miriam Baglioni	1c47c0d786	modified the number of executors trying to avoid OOM exception	2021-06-15 21:05:39 +02:00
Miriam Baglioni	7deac55138	added one option for resume from in the wf	2021-06-15 18:38:20 +02:00
Antonis Lempesis	f7c0b80e35	storing result_instance as parquet	2021-06-15 14:45:48 +03:00
Miriam Baglioni	66e7ef892f	changed the parameter name	2021-06-15 11:08:54 +02:00
Miriam Baglioni	4f47ad0891	no need to rename the folders, just write in overwrite mode, so I changed the name of the output folder	2021-06-15 09:28:31 +02:00
Miriam Baglioni	9f9dd00b94	refactoring	2021-06-15 09:24:46 +02:00
Miriam Baglioni	63d74ee379	refactoring	2021-06-15 09:24:11 +02:00
Miriam Baglioni	6ebc236657	added needed property: outputPath	2021-06-15 09:23:24 +02:00
Miriam Baglioni	f7379255b6	changed the workflow to extract info from the dump	2021-06-15 09:22:54 +02:00
Miriam Baglioni	d6e21bb6ea	creates the crossref dataset used for doiboost together with unpacking part from tar	2021-06-14 17:27:19 +02:00
Miriam Baglioni	4da141bd7c	Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids	2021-06-14 13:41:02 +02:00
Miriam Baglioni	ce0cfd79e0	creates the crossref dataset used for doiboost	2021-06-14 13:40:19 +02:00
Miriam Baglioni	93efe4de82	split the construction of crossref dataset in two parts. This one just unpacks the tar entries	2021-06-14 13:39:40 +02:00

1 2 3 4 5 ...

2753 Commits All Branches Search

2753 Commits

All Branches