Miriam Baglioni
63553a76b3
added code to download gold issn list from unibi
2021-07-22 12:01:48 +02:00
Miriam Baglioni
1a5b114906
DoiBoost AccessRigh #4362 - refactoring
2021-07-22 12:00:23 +02:00
Sandro La Bruzzo
31d2d6d41e
Scholexplorer: introduction of dedup openaire
2021-07-21 18:09:32 +02:00
Miriam Baglioni
b226ba4439
mergin with branch beta
2021-07-21 09:46:40 +02:00
Alessia Bardi
9069958479
tests for enermaps
2021-07-20 19:31:43 +02:00
Claudio Atzori
10d7b4f0b4
filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure
2021-07-20 11:52:05 +02:00
Claudio Atzori
77e8c6c7f7
filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure
2021-07-20 11:51:33 +02:00
Miriam Baglioni
83fe31c92e
changed the name of the workflows
2021-07-19 18:19:14 +02:00
Miriam Baglioni
dd81c36b60
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-07-19 18:18:14 +02:00
Miriam Baglioni
54acc5373b
changed the name of the workflows
2021-07-19 18:18:09 +02:00
Miriam Baglioni
b420b11ed3
duplicate the number of partitions in ProcessMag
2021-07-19 18:16:23 +02:00
Claudio Atzori
65934888a1
adding record identifier among the originalIds regardless of what IdentifierFactory produces
2021-07-19 17:52:52 +02:00
Claudio Atzori
5947cddafc
adding record identifier among the originalIds regardless of what IdentifierFactory produces
2021-07-19 17:52:24 +02:00
Claudio Atzori
0977baf41d
contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph
2021-07-19 17:43:52 +02:00
Claudio Atzori
5e5f65a3c3
contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph
2021-07-19 15:56:55 +02:00
Miriam Baglioni
662c396354
duplicate the number of partitions in ConvertCrossrefToOaf
2021-07-19 12:41:14 +02:00
Miriam Baglioni
59530a14fb
DoiBoost AccessRigh #4362 - set BestAccessRight with the ususal comparator
2021-07-19 12:34:35 +02:00
Miriam Baglioni
199123b74b
DoiBoost AccessRigh #4362 - Fixed issue on date formatting. Added test method and associated resource
2021-07-16 17:30:27 +02:00
Miriam Baglioni
c4b18e6ccb
changed the download.sh, added skip step to allow to not execute one phase and changed the workflow sequence of steps
2021-07-16 15:01:25 +02:00
Miriam Baglioni
acd6056330
added shell action to automatically download the new dump and put it in a specified hdfs location
2021-07-16 12:47:10 +02:00
Miriam Baglioni
3bc9a05bc9
mergin with branch beta
2021-07-16 10:32:27 +02:00
Miriam Baglioni
34506df1b6
DoiBoost AccessRigh #4362 - if the journal is open, the OPEN access right is set to all instances and color is GOLD (overwrite if the color was already set in one of the previous steps)
2021-07-16 10:29:51 +02:00
Claudio Atzori
bf9e0d2d4f
Merge pull request 'orcid-no-doi' ( #123 ) from enrico.ottonello/dnet-hadoop:orcid-no-doi into beta
...
Reviewed-on: #123
2021-07-15 17:59:41 +02:00
Sandro La Bruzzo
7e2caafe84
Scholexplorer: fixed mapping typologies
2021-07-15 09:53:12 +02:00
Enrico Ottonello
2dc50c0999
added default value to process path
2021-07-14 17:02:22 +02:00
Enrico Ottonello
66604bb2b4
added absolute path to process folder
2021-07-14 16:44:51 +02:00
Enrico Ottonello
7840cc6526
merged with master
2021-07-14 15:33:59 +02:00
Miriam Baglioni
4da46bb62f
mergin with branch beta
2021-07-14 15:08:52 +02:00
Enrico Ottonello
a65667d217
added publication to dataset even if no contributors
2021-07-14 15:07:07 +02:00
Sandro La Bruzzo
10068c00ea
Code refactor:
...
- removed old workflows in doiboost
- splitted workflow of doiboost in preprocess and process
2021-07-14 14:45:50 +02:00
Miriam Baglioni
09ad7b2a9e
DoiBoost AccessRigh #4362 - Unpaywall mapped to OAF with OPEN instance (non oa are filtered out) (unknown hostedby) + map the color as it is
2021-07-14 14:45:21 +02:00
Miriam Baglioni
f4f7c6f9d3
DoiBoost AccessRigh #4362 - Unpaywall mapped to OAF with OPEN instance (non oa are filtered out) (unknown hostedby) + map the color as it is
2021-07-14 14:44:54 +02:00
Miriam Baglioni
6222adf176
DoiBoost AccessRigh #4362 - added resources and test for crossref mapping (licence part included)
2021-07-14 14:42:34 +02:00
Miriam Baglioni
981b1018f6
DoiBoost AccessRigh #4362 - decide access right according to licence. Default access right is Unknown
2021-07-14 14:42:06 +02:00
Sandro La Bruzzo
3d8e2aa146
Code refactor:
...
- removed old workflows in doiboost
- splitted workflow of doiboost in preprocess and process
2021-07-14 14:37:06 +02:00
Miriam Baglioni
441701c85c
DoiBoost AccessRigh #4362 - If multiple licenses are available, take the one applied to 'vor'
2021-07-14 14:14:50 +02:00
Sandro La Bruzzo
c35c117601
fixed process doiboost workflow:
...
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 12:48:01 +02:00
Miriam Baglioni
1cdd09cd8e
Tentative fix for testing of Jenkins
2021-07-14 11:14:59 +02:00
Sandro La Bruzzo
4cb65bc64a
fixed process doiboost workflow:
...
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 09:44:32 +02:00
Miriam Baglioni
774cdb190e
changes to mirror the last dump of the graph with the ols data model.
2021-07-13 18:57:24 +02:00
Miriam Baglioni
886617afd0
One result linked to more than on project is saved just once
2021-07-13 18:15:35 +02:00
Miriam Baglioni
320cf02d96
Changed the way to find results linked to projects. We verify to actually have the project on the graph before selecting the result
2021-07-13 18:13:32 +02:00
Miriam Baglioni
52ce35d57b
-
2021-07-13 18:08:46 +02:00
Miriam Baglioni
970b387b8d
modification to allow dump of a single community
2021-07-13 18:08:10 +02:00
Miriam Baglioni
eae10c5894
modification to allow the dump for a single community
2021-07-13 18:07:25 +02:00
Miriam Baglioni
c028feef4f
workflow for the dump as sub workflows
2021-07-13 18:06:44 +02:00
Miriam Baglioni
d70f8c96fd
funding contains and not starts with h2020
2021-07-13 17:34:53 +02:00
Miriam Baglioni
5e38c7f42d
dumping only communities with status all
2021-07-13 17:32:38 +02:00
Claudio Atzori
734de62474
[doiboost] added workflow for the ActionSet update dedicated to production
2021-07-13 17:26:04 +02:00
Miriam Baglioni
618d2de2da
minor changes and refactoring
2021-07-13 17:10:02 +02:00
Miriam Baglioni
59615da65e
Add test to verify the creation of relation between context and projects
2021-07-13 17:09:15 +02:00
Miriam Baglioni
084b4ef999
added the creation of the openaireId from funder and grant number if the element is not present in the context profile
2021-07-13 17:07:46 +02:00
Claudio Atzori
fa720c1da4
[doiboost] added workflow for the ActionSet update dedicated to production
2021-07-13 16:59:30 +02:00
Miriam Baglioni
8f322a73cb
change because of the renaming of originalId in acronym
2021-07-13 16:22:58 +02:00
Miriam Baglioni
72397ea1ba
Added fix for community of arbitrary name length
2021-07-13 16:18:35 +02:00
Miriam Baglioni
5295d10691
added check not to dump deletedByInference entities
2021-07-13 16:11:46 +02:00
Claudio Atzori
9629569e22
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-07-13 16:04:08 +02:00
Claudio Atzori
f13e11e3f7
[aggregation] datacite wf: defined parameter declaring the path used to store the OAF objects produced by the transformation phase
2021-07-13 16:04:02 +02:00
Miriam Baglioni
e9a17ec899
added check to verify not to add void APC
2021-07-13 15:53:35 +02:00
Miriam Baglioni
8429aed6c6
Added resource for testing selection of valid relations
2021-07-13 15:49:38 +02:00
Miriam Baglioni
39b1a6edf6
added test class for the selection of valid relations and description
2021-07-13 15:23:09 +02:00
Miriam Baglioni
9a58f1b93d
added logic to select only the valid relations: those not deletedbyinference and having both part of the relation as entities in the graph
2021-07-13 15:20:39 +02:00
Miriam Baglioni
13c66e16be
changed logic to split for communities
2021-07-13 15:15:27 +02:00
Miriam Baglioni
6410ab71d8
added APC in the dump and test method
2021-07-13 15:13:58 +02:00
Miriam Baglioni
65a242646d
added resource for APC dump
2021-07-13 14:45:25 +02:00
Miriam Baglioni
4b432fbee8
extended test class
2021-07-13 14:40:39 +02:00
Miriam Baglioni
87a6e2b967
extended test class
2021-07-13 14:38:28 +02:00
Miriam Baglioni
69fd40fd30
modified code to split the Croatian funder
2021-07-13 14:35:26 +02:00
Miriam Baglioni
86e50f7311
modified code to split the Croatian funder
2021-07-13 14:31:45 +02:00
Miriam Baglioni
da88c850c6
changed the logic to verify if a community is contained in the list of context of a result
2021-07-13 14:22:44 +02:00
Miriam Baglioni
2f66fedfec
changed the logic to verify if a community is contained in the list of context of a result
2021-07-13 14:22:23 +02:00
Miriam Baglioni
f5486ffb14
Fixed issues to tests
2021-07-13 14:07:45 +02:00
Claudio Atzori
e0061232e9
[aggregation] datacite wf: conditional creation of links, optional resume from intermediate phases
2021-07-13 13:41:21 +02:00
Sandro La Bruzzo
bbe8193930
merged stable ids
2021-07-12 17:00:43 +02:00
Claudio Atzori
ae2b47b29d
[broker] added coalesce(1) on the stats dataset before storing it on postgres
2021-07-09 15:47:51 +02:00
Sandro La Bruzzo
57c74c73c6
fixed mistakes in oozie workflow
2021-07-09 12:28:09 +02:00
Sandro La Bruzzo
61ccb54fde
removed wrong loop on oozie wf
2021-07-09 12:17:57 +02:00
Sandro La Bruzzo
9f5a0f3ab6
moved wf indexing of Scholexplorer in dhp-graph-provision
2021-07-09 12:06:43 +02:00
Sandro La Bruzzo
09fccf8000
added workflow to serialize scholix and summary in json
2021-07-09 11:01:42 +02:00
Sandro La Bruzzo
0ea576745f
updated CreateInputGraph because ggenerics don't work on Spark Dataset
2021-07-09 10:29:24 +02:00
Sandro La Bruzzo
cd17e19044
implemented branch workflow to import datacite and crossref in scholexplorer
2021-07-08 21:20:19 +02:00
Miriam Baglioni
c30f3ce647
merge doi normalization
2021-07-08 19:20:02 +02:00
Sandro La Bruzzo
8a034e46e1
updated baseline workflow
2021-07-08 11:11:41 +02:00
Claudio Atzori
b7b8e0986e
[raw_all] The claim merge procedure includes the claimed contexts in the merged result
2021-07-08 10:42:31 +02:00
Sandro La Bruzzo
0799ac9fb6
fixed wrong path
2021-07-08 10:36:37 +02:00
Sandro La Bruzzo
4d53402712
extended ebiLinks to create a dataset before generation of OAF
2021-07-08 10:26:21 +02:00
Sandro La Bruzzo
a4a54a3786
code refactor
2021-07-08 09:08:25 +02:00
Sandro La Bruzzo
a01dbe0ab0
completed workflow of generation of scholix and summaries
2021-07-07 23:10:34 +02:00
Claudio Atzori
fdcff42e46
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-07-07 19:01:59 +02:00
Claudio Atzori
777536ce91
[aggregation] string values used as regular expressions in the OAI collection classes are defined in a single point as constants, to be reused across the code (PR#122)
2021-07-07 11:23:48 +02:00
Claudio Atzori
bc014023c8
Merge pull request 'to solve the scala SI-3623' ( #122 ) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: #122
2021-07-07 11:13:51 +02:00
Claudio Atzori
32bdfdccbc
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-07-07 11:08:27 +02:00
Andreas Czerniak
ebf3f47a02
from&until more OAI2.0 compl., adding tfs
2021-07-07 09:29:49 +02:00
Claudio Atzori
f580cb77e1
added mapping for claim relation 'resultResult_publicationDataset_isRelatedTo' (present on BETA)
2021-07-06 21:11:11 +02:00
Sandro La Bruzzo
ed684874f2
deleted old scholix project
2021-07-06 17:20:08 +02:00
Sandro La Bruzzo
8535506c22
added scholix generation
2021-07-06 17:18:06 +02:00
Sandro La Bruzzo
4c54bd8742
add test to verify merge scholix on source
2021-07-06 11:32:14 +02:00
Andreas Czerniak
3531802710
to solve the scala SI-3623
2021-07-06 11:30:56 +02:00
Sandro La Bruzzo
7d8db2eb8a
betterRenamingMethod
2021-07-06 09:56:32 +02:00
Sandro La Bruzzo
c952c8d236
generate first side of scholix mapping
2021-07-06 09:53:14 +02:00
Claudio Atzori
70ded407bb
HttpClient used in metadata collection retries also on 404
2021-07-05 18:04:30 +02:00
Miriam Baglioni
7177c25261
added check for null value during doi normalization
2021-07-05 16:22:38 +02:00
Miriam Baglioni
0892cad4e8
the normalization of the content of value was not visible outside the block. Moved doi normalization operation while returning value
2021-07-05 16:21:42 +02:00
Antonis Lempesis
89e6f46682
using organization ids instead of names in monitor db creation
2021-07-05 12:00:00 +03:00
Sandro La Bruzzo
e4b84ef5d6
fixed mapping OAF to Scholix summary
2021-07-02 16:48:48 +02:00
Sandro La Bruzzo
c6fa8598e1
massive code refactor:
...
removed modules dhp-*-scholexplorer
2021-07-01 22:13:45 +02:00
Antonis Lempesis
829caee4fd
added the missing indicators files
2021-06-30 17:31:33 +02:00
Sandro La Bruzzo
84b834c893
added test dataset test for pangaea
2021-06-30 17:31:09 +02:00
Sandro La Bruzzo
1a6b398968
implemented Creation of Raw Graph and Resolution
2021-06-30 17:27:55 +02:00
Miriam Baglioni
bc34347643
added assertions to verify doi normalization
2021-06-30 14:37:08 +02:00
Miriam Baglioni
86f47afcc7
slight modification of the resource to accomodate also doi normalization tests
2021-06-30 14:36:49 +02:00
Miriam Baglioni
03767ea8e6
slight modification of the resource to accomodate also doi normalization tests
2021-06-30 13:21:24 +02:00
Miriam Baglioni
f8eec0ca9a
added resource to test the normalization of doi during the import of MAG
2021-06-30 13:19:54 +02:00
Miriam Baglioni
149f85ddf5
added tests for the normalization of the dois
2021-06-30 13:00:52 +02:00
Miriam Baglioni
e487b5544c
added tests for the normalization of the dois
2021-06-30 12:57:11 +02:00
Miriam Baglioni
1503ccbbb5
added tests for the normalization of the dois
2021-06-30 12:55:37 +02:00
Miriam Baglioni
1299bfb357
Added class to test the normalization of doi
2021-06-30 12:53:27 +02:00
Sandro La Bruzzo
623a0c4edb
code Refactor, renaming packages
2021-06-30 11:09:30 +02:00
Miriam Baglioni
cf758f4f91
added normalization step for the doi
2021-06-30 10:03:15 +02:00
Miriam Baglioni
801763a0fa
there is no more the need to lower case the doi since it is done in the first step. Also changed the creation of the id by using the factory
2021-06-29 19:07:23 +02:00
Miriam Baglioni
a74de1cda2
added normalization step to the doi
2021-06-29 18:51:11 +02:00
Miriam Baglioni
06074ea7d3
added normalization step to the doi
2021-06-29 18:46:08 +02:00
Miriam Baglioni
8b8ffe82dc
added step of normalization for the doi
2021-06-29 18:41:39 +02:00
Miriam Baglioni
50cc21d92e
Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.)
2021-06-29 18:35:28 +02:00
Antonis Lempesis
87f14a3899
added the missing indicators files
2021-06-29 16:31:51 +03:00
Sandro La Bruzzo
db933ebd21
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-29 14:16:12 +02:00
Sandro La Bruzzo
7e08655e5f
added relation dates in all scholexplorer Datasources
2021-06-29 12:02:03 +02:00
Sandro La Bruzzo
075055eaca
added relation dates in bio mapping
2021-06-29 10:33:09 +02:00
Sandro La Bruzzo
f36f92287d
implemented mapping from Crossref Event Data to Oaf
2021-06-29 10:21:23 +02:00
Antonis Lempesis
018c4eb52c
copied latest changes from old fork: indicators+monitor institutions
2021-06-28 23:46:52 +03:00
Sandro La Bruzzo
511ec14c63
implemented mapping from EBI and Scholix Resolved to OAF
2021-06-28 22:04:22 +02:00
Claudio Atzori
af42377d0e
HttpClient used in metadata collection retries on 502, 503, 504
2021-06-28 09:34:30 +02:00
Sandro La Bruzzo
ad50415167
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-24 17:20:50 +02:00
Sandro La Bruzzo
80e15cc455
implemented mapping from uniprot, pdb and ebi links
2021-06-24 17:20:00 +02:00
Claudio Atzori
2e8fd2c531
cleanup
2021-06-23 14:38:24 +02:00
Claudio Atzori
4dc9ebf217
[raw_all] fixed unit test
2021-06-23 14:38:07 +02:00
Claudio Atzori
50fc5a64a0
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-06-23 11:49:42 +02:00
Claudio Atzori
5edcc6832a
applying sonarLint suggestions
2021-06-23 09:53:29 +02:00
Sandro La Bruzzo
080a280bea
added pdb to Oaf Transformation
2021-06-21 16:23:59 +02:00
Sandro La Bruzzo
1dc0c59e20
merged fix thai dates from stable_ids
2021-06-21 10:39:46 +02:00
Sandro La Bruzzo
dc66cf615b
Merge branch 'stable_id_scholexplorer' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer
2021-06-21 09:38:33 +02:00
Sandro La Bruzzo
507e42102a
added pdb to oaf class
2021-06-21 09:36:40 +02:00
Sandro La Bruzzo
a167543637
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer
2021-06-21 09:14:11 +02:00
Sandro La Bruzzo
4fe7b75644
renamed packages
2021-06-18 16:41:24 +02:00
Sandro La Bruzzo
3990165d05
changed typologies of unresolved relation
2021-06-18 11:43:59 +02:00
Miriam Baglioni
180d671127
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-18 09:46:18 +02:00
Miriam Baglioni
13c96622c9
-
2021-06-18 09:45:16 +02:00
Miriam Baglioni
b486ae498f
added test and test resource to verify the generation of the date of acceptance from the input extracted from the dump
2021-06-18 09:43:32 +02:00
Miriam Baglioni
464c2ddde3
changed to split in two steps the generation of the crossref dataset
2021-06-18 09:42:31 +02:00
Miriam Baglioni
6aca0d8ebb
added kryo encoding for input files
2021-06-18 09:42:07 +02:00
Miriam Baglioni
3585e53da3
changed to split in two steps the generation of the crossref dataset
2021-06-18 09:41:23 +02:00
Claudio Atzori
41b551562e
applying PR#115 (DatePicker) on stable_ids
2021-06-17 09:33:50 +02:00
Sandro La Bruzzo
3100166d29
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-16 16:22:16 +02:00
Claudio Atzori
74833d04f1
Merge branch 'pids_beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into stable_ids
2021-06-16 15:54:18 +02:00
Claudio Atzori
7243a40c88
code formatting
2021-06-16 15:03:03 +02:00
Sandro La Bruzzo
dfcf78cf24
removed wrong code
2021-06-16 14:57:42 +02:00
Sandro La Bruzzo
cc0f2b11fb
Implemented mapping from pubmed baseline to OAF
2021-06-16 14:56:24 +02:00
Miriam Baglioni
95885bcf12
forces executor Executor memory and driver executor memory to be 7G (trying to avoid OOM)
2021-06-16 10:17:52 +02:00
Miriam Baglioni
2550a73981
-
2021-06-16 10:04:41 +02:00
Miriam Baglioni
1c47c0d786
modified the number of executors trying to avoid OOM exception
2021-06-15 21:05:39 +02:00
Miriam Baglioni
7deac55138
added one option for resume from in the wf
2021-06-15 18:38:20 +02:00
Antonis Lempesis
f7c0b80e35
storing result_instance as parquet
2021-06-15 14:45:48 +03:00
Miriam Baglioni
66e7ef892f
changed the parameter name
2021-06-15 11:08:54 +02:00
Miriam Baglioni
4f47ad0891
no need to rename the folders, just write in overwrite mode, so I changed the name of the output folder
2021-06-15 09:28:31 +02:00
Miriam Baglioni
9f9dd00b94
refactoring
2021-06-15 09:24:46 +02:00
Miriam Baglioni
63d74ee379
refactoring
2021-06-15 09:24:11 +02:00
Miriam Baglioni
6ebc236657
added needed property: outputPath
2021-06-15 09:23:24 +02:00
Miriam Baglioni
f7379255b6
changed the workflow to extract info from the dump
2021-06-15 09:22:54 +02:00
Miriam Baglioni
d6e21bb6ea
creates the crossref dataset used for doiboost together with unpacking part from tar
2021-06-14 17:27:19 +02:00
Miriam Baglioni
4da141bd7c
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-14 13:41:02 +02:00
Miriam Baglioni
ce0cfd79e0
creates the crossref dataset used for doiboost
2021-06-14 13:40:19 +02:00
Miriam Baglioni
93efe4de82
split the construction of crossref dataset in two parts. This one just unpacks the tar entries
2021-06-14 13:39:40 +02:00
Michele Artini
ada063ce70
fixed a problem with empty mdstore list (2)
2021-06-14 12:04:47 +02:00
Michele Artini
83132ee99a
fixed a problem with empty mdstore list
2021-06-14 11:57:00 +02:00
Miriam Baglioni
cf360d7c97
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-14 10:19:49 +02:00
Miriam Baglioni
8873e6b6d1
workflow and parameter
2021-06-14 10:15:57 +02:00
Miriam Baglioni
0f1acdf6b6
workflow and parameter
2021-06-14 10:08:55 +02:00
Sandro La Bruzzo
aeb8132627
Merged branch stable_ids
2021-06-14 10:07:29 +02:00
Sandro La Bruzzo
efbea1e01a
minor fix
2021-06-14 09:45:14 +02:00
Miriam Baglioni
75780fc636
extraction of the tar for the dump of crossref, and creation of the dataset
2021-06-14 09:45:07 +02:00
Claudio Atzori
2039bb9f5f
orcid / orcid_pending cleaning backported from master branch
2021-06-14 09:40:50 +02:00
Claudio Atzori
dd19c4ac5a
Merge pull request 'import_new_mdstores' ( #112 ) from import_new_mdstores into stable_ids
...
Reviewed-on: #112
2021-06-14 09:23:55 +02:00
Claudio Atzori
e9e86a237d
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-11 17:00:02 +02:00
Claudio Atzori
a900bfb874
delegating the date parsing to https://github.com/sisyphsu/dateparser
2021-06-11 16:53:01 +02:00
Sandro La Bruzzo
dd997c49e0
fix wrong relation id
...
fix date thai ticket #6791
2021-06-10 14:47:18 +02:00
Antonis Lempesis
d413b24611
added instances, orgs for monitor, totalcost for projects, apcs
2021-06-10 02:35:46 +03:00
Claudio Atzori
741077dbca
Merge pull request 'Fix in Affiliation Propagation' ( #113 ) from miriam.baglioni/dnet-hadoop:master into stable_ids
...
Reviewed-on: #113
2021-06-09 18:42:42 +02:00
Miriam Baglioni
32b0c27217
Aggiornare 'dhp-workflows/dhp-enrichment/src/main/java/eu/dnetlib/dhp/resulttoorganizationfrominstrepo/PrepareResultInstRepoAssociation.java'
...
fix in SQL query: while writing the blacklist constraint it used d.id to indicate the datasource id, but no alias for the datasource was defined. So I removed the alias
2021-06-09 18:36:11 +02:00
Sandro La Bruzzo
0d1f37302f
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer
2021-06-09 09:35:16 +02:00
Miriam Baglioni
dc07f1079b
added check in case the author set to be enriched is null
2021-06-08 12:06:10 +02:00
Miriam Baglioni
8d2e086e48
changes to avoid reassignment to val
2021-06-07 17:50:37 +02:00
Miriam Baglioni
f33521d338
Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
...
to be able to replace the aboject assigned to author val has been replaced by var
2021-06-07 17:27:07 +02:00
Miriam Baglioni
bc12e9819e
Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
...
The change is to fix the issue that arises when the same work appears more than once on the same ORCID profile. The change avoid to replicate the association doi -> author when the orcid id is already associated to the doi.
2021-06-07 16:37:01 +02:00
Sandro La Bruzzo
0cdb7ccdaa
added inverse relations to datacite mapping
2021-06-04 15:10:20 +02:00
Sandro La Bruzzo
5b724d9972
added relations to datacite mapping
2021-06-04 10:14:22 +02:00
Sandro La Bruzzo
e57294ac99
implemented changes on PUBMed dataflow
2021-06-03 10:52:09 +02:00
Michele Artini
ede2749822
orcid pid type
2021-06-01 12:42:43 +02:00
Michele Artini
f0fbfdcfae
Merge branch 'stable_ids' into import_new_mdstores
2021-06-01 12:03:00 +02:00
Michele Artini
e950750262
add nodes to import hdfs mdstores
2021-06-01 10:48:50 +02:00
Michele Artini
03a510859a
removed coalesce(1)
2021-05-31 14:10:51 +02:00
Michele Artini
e9f2b6037c
patch of mdstore records
2021-05-31 11:36:26 +02:00
Sandro La Bruzzo
02ef46535f
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-05-31 09:50:15 +02:00
Sandro La Bruzzo
aeadc5a366
updated wf Datacite Import to retrieve the block size as parameter
2021-05-31 09:49:53 +02:00
Claudio Atzori
96238152cb
added serialization for alternateIdentifiers and pids within each record instance
2021-05-28 16:57:30 +02:00
Michele Artini
ad56a44fda
save as gzipped sequence file
2021-05-28 14:45:39 +02:00
Claudio Atzori
83722ebc47
pull #111 replied on stable_ids
2021-05-28 14:11:46 +02:00
Claudio Atzori
6e3a4e9237
updated test expectations
2021-05-28 09:37:50 +02:00
Michele Artini
4fa5671d16
first implementation of Hdfs Mdstores Importer
2021-05-27 16:22:07 +02:00
Claudio Atzori
d512062b58
integrating pull #109 , H2020Classification
2021-05-27 12:22:47 +02:00
Claudio Atzori
5e4b91d9ef
more pervasive use of constants from ModelConstants, especially for ORCID
2021-05-26 18:20:23 +02:00
Sandro La Bruzzo
bced804151
updated wf Datacite Import to retrieve the block size as parameter
2021-05-26 17:06:50 +02:00
Miriam Baglioni
abd88f663d
changed test resource to mirror change in the input file
2021-05-21 15:20:47 +02:00
Miriam Baglioni
c844877de2
changed workflow flow to possibly parallelize also the programme and project preparation steps
2021-05-21 14:41:57 +02:00
Miriam Baglioni
073d76864d
refactoring
2021-05-21 14:41:03 +02:00
Miriam Baglioni
4c8b4a774c
removed not needed code
2021-05-21 14:40:07 +02:00
Enrico Ottonello
abdd0ade1f
added temporary output folder as workflow parameter
2021-05-21 12:08:16 +02:00
Miriam Baglioni
53b9d87fec
new prepareProgramme according to the new file
2021-05-21 11:49:31 +02:00
Miriam Baglioni
1ee8f13580
refactoring and added "left" as join type to be 100% sure to get the whole set of projects
2021-05-21 11:49:05 +02:00
Miriam Baglioni
e07c3ba089
due to change in the input file the filtering step is no more needed
2021-05-21 11:47:43 +02:00
Miriam Baglioni
54f6e2f693
changed to get the needed information to build the action set as parallel jobs
2021-05-21 11:47:00 +02:00
Miriam Baglioni
7180505519
removed non needed variable
2021-05-21 11:46:13 +02:00
Miriam Baglioni
2eb1a8b344
changed because the input file changed
2021-05-21 11:40:20 +02:00
Enrico Ottonello
d0945c3c78
added temporary output folder, because of folder access rights are different on beta and prod
2021-05-20 19:14:31 +02:00
Enrico Ottonello
1265dadc90
workflow aligned with stable_ids
2021-05-20 19:01:28 +02:00
Enrico Ottonello
0821d8e97d
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-05-20 18:33:18 +02:00
Enrico Ottonello
ae7bd24d79
removed old workflows
2021-05-20 18:32:22 +02:00
Claudio Atzori
9d725efdc1
reverted implementation of the mdstore client
2021-05-20 18:26:09 +02:00
Miriam Baglioni
9610224671
added param to workflow property
2021-05-20 18:21:12 +02:00
Claudio Atzori
863b56b6ce
using constants from ModelConstants
2021-05-20 16:23:58 +02:00
Claudio Atzori
ae5c28e54f
code formatting
2021-05-20 16:13:06 +02:00
Miriam Baglioni
aa45b4df9b
-
2021-05-20 15:57:40 +02:00
Miriam Baglioni
052c837843
-
2021-05-20 15:54:44 +02:00
Claudio Atzori
b695932ae4
integrated pull#108
2021-05-20 15:34:04 +02:00
Claudio Atzori
ea9b00ce56
adjusted test
2021-05-20 15:31:42 +02:00
Claudio Atzori
b572f56763
Merge branch 'master' into master
2021-05-20 15:22:35 +02:00
Claudio Atzori
2578b7fbb3
code formatting
2021-05-20 14:59:02 +02:00
Miriam Baglioni
dc0ad8d2e0
fixed issue related to change in the file name downloaded. Added sheet name as parameter and also a check if the name should change
2021-05-20 14:53:53 +02:00
Claudio Atzori
232dce83db
fixes #6701 : xpath for titles to support both datacite and Guidelines v4 mapping
2021-05-20 14:41:15 +02:00
Claudio Atzori
aef2977ad0
fixes #6701 : xpath for titles to support both datacite and Guidelines v4 mapping
2021-05-20 14:40:22 +02:00
Miriam Baglioni
02b80cf24f
resolved conflicts
2021-05-20 10:59:39 +02:00
Claudio Atzori
c4a23c2f4d
fix: preserving the old identifier among the originalIds in the doiboost construction process, trying to avoid UnsupportedOperationException while adding elements to the originalIds
2021-05-19 16:01:52 +02:00
Claudio Atzori
ba03f549d7
fix: preserving the old identifier among the originalIds in the doiboost construction process
2021-05-19 15:43:26 +02:00
Claudio Atzori
239d0f0a9a
ROR actionset import workflow backported from branch stable_ids
2021-05-18 16:12:11 +02:00
Antonis Lempesis
168edcbde3
added the final steps for the observatory promote wf and some cleanup
2021-05-18 15:23:20 +03:00
Michele Artini
e56ccec536
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-05-18 14:00:28 +02:00
Michele Artini
c1e20de7cf
fixed the deserialization of a json property
2021-05-18 14:00:14 +02:00
Claudio Atzori
a9f512103b
using constants from ModelConstants
2021-05-18 11:19:07 +02:00
Claudio Atzori
eeb8bcf075
using constants from ModelConstants
2021-05-18 11:10:07 +02:00
Claudio Atzori
2cbf15f4fb
using ModelConstants
2021-05-17 09:54:45 +02:00
Enrico Ottonello
e13926cdd0
merged with master
2021-05-14 18:10:31 +02:00
Claudio Atzori
f19feceaf0
set the old identifier before switching to the new one
2021-05-14 12:53:40 +02:00
Claudio Atzori
1bd70fa2c6
preserving the old identifier among the originalIds in the doiboost construction process
2021-05-14 11:30:41 +02:00
Claudio Atzori
ca3f3a7687
using ModelConstants
2021-05-14 11:29:49 +02:00
Claudio Atzori
23b8883ab1
applied intellij code cleanup
2021-05-14 10:58:12 +02:00
Claudio Atzori
609eb711b3
IndexRecordTransformerTest for producing a record that can be manually submitted to solr
2021-05-13 16:13:28 +02:00
Claudio Atzori
1517bf7c92
IndexRecordTransformerTest for producing a record that can be manually submitted to solr
2021-05-13 16:11:22 +02:00
Sandro La Bruzzo
d9a0bbda7b
implemented new phase in doiboost to make the dataset Distinct by ID
2021-05-13 12:25:14 +02:00
Sandro La Bruzzo
6424cd9062
Added passing of the following parameters:
...
-varDataSourceId
-varOfficialName
in Each transformation Rule
2021-05-11 15:17:38 +02:00
Sandro La Bruzzo
073dcea2aa
Added passing of the following parameters:
...
-varDataSourceId
-varOfficialName
in Each transformation Rule
2021-05-11 15:05:58 +02:00
Claudio Atzori
d4c3476152
mapping datasource.journal only when an issn is available, null otherwhise
2021-05-11 11:08:54 +02:00
Claudio Atzori
da9d6f3887
mapping datasource.journal only when an issn is available, null otherwhise
2021-05-11 10:45:30 +02:00
Sandro La Bruzzo
54217d73ff
removed old parameters from oozie workflow
2021-05-11 09:59:02 +02:00
Claudio Atzori
d1cbee8413
imported methods from CleaningFunctions, defined in GraphCleaningFunctions
2021-05-10 16:43:39 +02:00
Claudio Atzori
3797543600
MDStoreManager model classes moved in dhp-schemas
2021-05-10 14:32:05 +02:00
Claudio Atzori
25254885b9
[ActionManagement] reduced number of xqueries used to access ActionSet info
2021-05-07 17:32:03 +02:00
Claudio Atzori
8a0de2fc18
[ActionManagement] reduced number of xqueries used to access ActionSet info
2021-05-07 17:31:32 +02:00
Sandro La Bruzzo
7dc824fc23
imported changes in stable_id into master
2021-05-07 12:53:50 +02:00
Michele Artini
d82071ba6c
originalId with prefix
2021-05-06 15:34:48 +02:00
Claudio Atzori
d4a30fabe3
clean up tests
2021-05-05 17:28:15 +02:00
Claudio Atzori
dccaf173cf
fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials
2021-05-05 16:36:15 +02:00
Claudio Atzori
8c96a82a03
fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials
2021-05-05 15:30:06 +02:00
Claudio Atzori
2e1eb96f9a
code formatting
2021-05-05 11:23:57 +02:00
Sandro La Bruzzo
1adfc41d23
merged manually changes on stable_id for doiboost into master
2021-05-05 10:23:32 +02:00
Claudio Atzori
fb930b84d3
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-05-04 18:06:30 +02:00
Claudio Atzori
923d19ea8e
mdstore read lock/unlock when bulk copying records from mongodb to hdfs
2021-05-04 18:06:21 +02:00
Sandro La Bruzzo
714b71bd21
updated pubmed
2021-05-04 14:54:12 +02:00
Claudio Atzori
ba86835951
using common constants from ModelConstants
2021-05-04 11:51:52 +02:00
Michele Artini
f4bd2b5619
recert file SparkDedupTest.java
2021-05-04 10:26:14 +02:00
Michele Artini
b4877da363
Merge branch 'stable_ids' into prepare_ror_actionset
2021-05-03 08:13:55 +02:00
Alessia Bardi
9a20057615
fixed query for organisations' pids
2021-04-29 15:23:39 +02:00
Michele Artini
6692128234
Merge branch 'stable_ids' into prepare_ror_actionset
2021-04-29 13:24:08 +02:00
Alessia Bardi
a801999e75
fixed query for organisations' pids
2021-04-29 12:18:42 +02:00
Michele Artini
a278d67175
parse input file
2021-04-29 11:34:47 +02:00
Claudio Atzori
f6ccd54d87
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-04-29 10:10:01 +02:00
Claudio Atzori
91e7220f20
cleaned up workflow for actionset migration, adjusted dnet|cnr* dependency versions
2021-04-29 10:09:52 +02:00
Michele Artini
f77ba34126
pid types
2021-04-29 09:50:05 +02:00
Michele Artini
7c5cd86927
annotations and tests
2021-04-29 09:29:19 +02:00
Michele Artini
b5cf505cc6
partial implementation of the ROR->actionset workflow
2021-04-28 16:00:24 +02:00
Enrico Ottonello
c537986b7c
deleted folders with merged data immediately before merge phases
2021-04-28 11:25:25 +02:00
Sandro La Bruzzo
2129e9caa7
updated pangaea transformation to parse directly the xml
2021-04-28 10:21:03 +02:00
Claudio Atzori
5afa7d3e0c
core utilities in dhp-common moved in external module dhp-schemas
2021-04-27 15:44:01 +02:00
Alessia Bardi
e6075bb917
updated json schema for results - added instances and accessright definition
2021-04-27 15:15:08 +02:00
Sandro La Bruzzo
63c0303137
removed unused import, add log
2021-04-27 12:17:23 +02:00
Sandro La Bruzzo
74484d2823
bug fixing
2021-04-27 12:13:44 +02:00
Sandro La Bruzzo
c74b03d59c
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-27 11:31:07 +02:00
Sandro La Bruzzo
7f8848ecdd
added first implementation of Pangaea Mapping
2021-04-27 11:30:37 +02:00
Claudio Atzori
27ab8a704d
adjusted poms to align with the external dhp-schema module
2021-04-27 10:12:27 +02:00
Claudio Atzori
a7cf449b36
cleanup
2021-04-27 10:11:26 +02:00
Claudio Atzori
fa42026590
fixed PersonCleaner extension functions
2021-04-27 10:10:06 +02:00
Claudio Atzori
ef4bfd82e2
code formatting
2021-04-27 10:09:31 +02:00
Claudio Atzori
faa8f6f4e2
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-04-27 09:57:03 +02:00
miconis
6d5c14e030
assertions updated in entity merger test
2021-04-27 09:47:49 +02:00
Claudio Atzori
c2bb03c8b5
depending on external dhp-schemas module
2021-04-23 17:57:35 +02:00
Claudio Atzori
7ed107be53
depending on external dhp-schemas module
2021-04-23 17:52:36 +02:00
Claudio Atzori
c25238480c
making ODF record parsing namespace unaware ( #6629 )
2021-04-23 17:34:57 +02:00
Claudio Atzori
99cfb027fa
making ODF record parsing namespace unaware ( #6629 )
2021-04-23 17:09:36 +02:00
Miriam Baglioni
72e5aa3b42
refactoring
2021-04-23 12:10:30 +02:00
Miriam Baglioni
7d1b8b7f64
merge upstream
2021-04-23 11:55:49 +02:00
miconis
d0e3366c34
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-22 11:45:19 +02:00
miconis
3c12eeadce
bug fix in propagation of relations
2021-04-22 11:44:33 +02:00
Claudio Atzori
e5abbec2ba
[orcid] download of the lambda file defined in a script
2021-04-22 11:22:10 +02:00
Claudio Atzori
55964cbd81
[orcid] large oozie workflow cleanup; updated workflow for the orcidnodoi actionset creation
2021-04-22 10:18:09 +02:00
Claudio Atzori
8f309b72ff
[dedup] using node names consistently across the workflow
2021-04-21 17:54:51 +02:00
Claudio Atzori
52244f813a
merging from enrico.ottonello/dnet-hadoop:orcid-no-doi
2021-04-21 12:24:09 +02:00
Sandro La Bruzzo
fd29307b84
updated workflow name
2021-04-21 09:21:41 +02:00
Claudio Atzori
815b9f4d56
[openorgs dedup] fixed workflow parameter declarations. Introduced support for resuming the execution from intermediate steps
2021-04-20 17:24:45 +02:00
Claudio Atzori
d0d477cca3
code formatting
2021-04-20 12:50:34 +02:00
miconis
0393cdce42
addition of alternative names in export queries
2021-04-20 12:45:21 +02:00
miconis
cadd0a5de8
modification of the queries for openorgs: they now consider also pending orgs
2021-04-20 12:06:56 +02:00
Sandro La Bruzzo
e06c7f32f6
updated id figshare as described in #6377
2021-04-20 10:18:07 +02:00
Sandro La Bruzzo
dbe0d0378e
resolved ticket #6377
2021-04-20 09:44:44 +02:00
Antonis Lempesis
625d993cd9
added step for observatory db
2021-04-20 02:31:06 +03:00
Antonis Lempesis
25d0512fbd
code cleanup
2021-04-20 01:43:23 +03:00
Sandro La Bruzzo
524e5f3092
Improved parallelization on transformation wf on hadoop
2021-04-19 15:17:25 +02:00
Sandro La Bruzzo
cdfe01bbae
improved parallelization on transformation job
2021-04-19 15:14:52 +02:00
Sandro La Bruzzo
3ae67b7a1d
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-16 17:36:57 +02:00
Sandro La Bruzzo
a16e5299f9
applied unique function on the final dataset
2021-04-16 17:36:48 +02:00
Claudio Atzori
45057440c1
code formatting
2021-04-16 17:28:25 +02:00
Enrico Ottonello
34ca792a55
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-04-16 17:18:46 +02:00
Enrico Ottonello
27068aacd1
wf to move orcid-no-doi dataset on the folder ready the import
2021-04-16 17:17:47 +02:00
miconis
7ad573d023
bug fix: changed join in propagaterelations without applying filter on the id
2021-04-16 16:40:42 +02:00
Sandro La Bruzzo
67085da305
fixed NPE
2021-04-16 11:05:58 +02:00
Sandro La Bruzzo
644aa8f40c
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-16 09:14:26 +02:00
Sandro La Bruzzo
7d6a80e2f2
added new type on MAG mapping
2021-04-16 09:14:15 +02:00
Claudio Atzori
906d50563c
Merge pull request 'properly invalidating impala metadata' ( #105 ) from antonis.lempesis/dnet-hadoop:master into master
...
Reviewed-on: #105
2021-04-15 15:06:22 +02:00
Claudio Atzori
3d58f95522
[stats update] properly invalidating impala metadata
2021-04-15 15:03:05 +02:00
Antonis Lempesis
03d36fadea
properly invalidating impala metadata
2021-04-15 13:34:22 +03:00
miconis
f64e57c112
refactoring of the id generation, sparkcreatemergerels collects entities to create root id after a join
2021-04-15 10:59:24 +02:00
miconis
176a5e493d
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-14 18:06:34 +02:00
miconis
3525a8f504
id generation of representative record moved to the SparkCreateMergeRel job
2021-04-14 18:06:07 +02:00
Sandro La Bruzzo
3f77bfceb0
fixed test failure on jenkins
2021-04-14 10:03:01 +02:00
Claudio Atzori
3125cef545
code formatting
2021-04-14 09:11:54 +02:00
Sandro La Bruzzo
44a0064df6
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-13 17:48:12 +02:00
Sandro La Bruzzo
479abd10cb
Add into ORCID workflow a method that extracts orcid directly to the dump generated by Enrico
2021-04-13 17:47:43 +02:00
Claudio Atzori
710cd1e8f2
Merge pull request 'add xslt, personname cleaner' ( #104 ) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: #104
LGTM
2021-04-13 14:43:05 +02:00
Claudio Atzori
d1ca025b0b
[cleaning] remiving authors without fullname or providing 'deactivated' keyword. Removing test test titles
2021-04-13 14:32:41 +02:00
miconis
1542196a33
bug fix: starting node of duplicate scan wf changed
2021-04-13 10:15:43 +02:00
miconis
369ed1cd8a
bug fix: lookupurl parameter added to dedup record job
2021-04-13 09:08:05 +02:00
Andreas Czerniak
3b694074ff
add xslt, personname cleaner
2021-04-13 07:04:27 +02:00
Claudio Atzori
511c0521e5
[dedup] avoiding NPEs handling OpenOrg relations
2021-04-12 17:45:11 +02:00