Claudio Atzori
d4c3476152
mapping datasource.journal only when an issn is available, null otherwhise
2021-05-11 11:08:54 +02:00
Claudio Atzori
da9d6f3887
mapping datasource.journal only when an issn is available, null otherwhise
2021-05-11 10:45:30 +02:00
Sandro La Bruzzo
54217d73ff
removed old parameters from oozie workflow
2021-05-11 09:59:02 +02:00
Claudio Atzori
d1cbee8413
imported methods from CleaningFunctions, defined in GraphCleaningFunctions
2021-05-10 16:43:39 +02:00
Claudio Atzori
3797543600
MDStoreManager model classes moved in dhp-schemas
2021-05-10 14:32:05 +02:00
Claudio Atzori
25254885b9
[ActionManagement] reduced number of xqueries used to access ActionSet info
2021-05-07 17:32:03 +02:00
Claudio Atzori
8a0de2fc18
[ActionManagement] reduced number of xqueries used to access ActionSet info
2021-05-07 17:31:32 +02:00
Sandro La Bruzzo
7dc824fc23
imported changes in stable_id into master
2021-05-07 12:53:50 +02:00
Michele Artini
d82071ba6c
originalId with prefix
2021-05-06 15:34:48 +02:00
Claudio Atzori
d4a30fabe3
clean up tests
2021-05-05 17:28:15 +02:00
Claudio Atzori
dccaf173cf
fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials
2021-05-05 16:36:15 +02:00
Claudio Atzori
8c96a82a03
fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials
2021-05-05 15:30:06 +02:00
Claudio Atzori
2e1eb96f9a
code formatting
2021-05-05 11:23:57 +02:00
Sandro La Bruzzo
1adfc41d23
merged manually changes on stable_id for doiboost into master
2021-05-05 10:23:32 +02:00
Claudio Atzori
fb930b84d3
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-05-04 18:06:30 +02:00
Claudio Atzori
923d19ea8e
mdstore read lock/unlock when bulk copying records from mongodb to hdfs
2021-05-04 18:06:21 +02:00
Sandro La Bruzzo
714b71bd21
updated pubmed
2021-05-04 14:54:12 +02:00
Claudio Atzori
ba86835951
using common constants from ModelConstants
2021-05-04 11:51:52 +02:00
Michele Artini
f4bd2b5619
recert file SparkDedupTest.java
2021-05-04 10:26:14 +02:00
Michele Artini
b4877da363
Merge branch 'stable_ids' into prepare_ror_actionset
2021-05-03 08:13:55 +02:00
Alessia Bardi
9a20057615
fixed query for organisations' pids
2021-04-29 15:23:39 +02:00
Michele Artini
6692128234
Merge branch 'stable_ids' into prepare_ror_actionset
2021-04-29 13:24:08 +02:00
Alessia Bardi
a801999e75
fixed query for organisations' pids
2021-04-29 12:18:42 +02:00
Michele Artini
a278d67175
parse input file
2021-04-29 11:34:47 +02:00
Claudio Atzori
f6ccd54d87
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-04-29 10:10:01 +02:00
Claudio Atzori
91e7220f20
cleaned up workflow for actionset migration, adjusted dnet|cnr* dependency versions
2021-04-29 10:09:52 +02:00
Michele Artini
f77ba34126
pid types
2021-04-29 09:50:05 +02:00
Michele Artini
7c5cd86927
annotations and tests
2021-04-29 09:29:19 +02:00
Michele Artini
b5cf505cc6
partial implementation of the ROR->actionset workflow
2021-04-28 16:00:24 +02:00
Enrico Ottonello
c537986b7c
deleted folders with merged data immediately before merge phases
2021-04-28 11:25:25 +02:00
Sandro La Bruzzo
2129e9caa7
updated pangaea transformation to parse directly the xml
2021-04-28 10:21:03 +02:00
Claudio Atzori
5afa7d3e0c
core utilities in dhp-common moved in external module dhp-schemas
2021-04-27 15:44:01 +02:00
Alessia Bardi
e6075bb917
updated json schema for results - added instances and accessright definition
2021-04-27 15:15:08 +02:00
Sandro La Bruzzo
63c0303137
removed unused import, add log
2021-04-27 12:17:23 +02:00
Sandro La Bruzzo
74484d2823
bug fixing
2021-04-27 12:13:44 +02:00
Sandro La Bruzzo
c74b03d59c
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-27 11:31:07 +02:00
Sandro La Bruzzo
7f8848ecdd
added first implementation of Pangaea Mapping
2021-04-27 11:30:37 +02:00
Claudio Atzori
27ab8a704d
adjusted poms to align with the external dhp-schema module
2021-04-27 10:12:27 +02:00
Claudio Atzori
a7cf449b36
cleanup
2021-04-27 10:11:26 +02:00
Claudio Atzori
fa42026590
fixed PersonCleaner extension functions
2021-04-27 10:10:06 +02:00
Claudio Atzori
ef4bfd82e2
code formatting
2021-04-27 10:09:31 +02:00
Claudio Atzori
faa8f6f4e2
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-04-27 09:57:03 +02:00
miconis
6d5c14e030
assertions updated in entity merger test
2021-04-27 09:47:49 +02:00
Claudio Atzori
c2bb03c8b5
depending on external dhp-schemas module
2021-04-23 17:57:35 +02:00
Claudio Atzori
7ed107be53
depending on external dhp-schemas module
2021-04-23 17:52:36 +02:00
Claudio Atzori
c25238480c
making ODF record parsing namespace unaware ( #6629 )
2021-04-23 17:34:57 +02:00
Claudio Atzori
99cfb027fa
making ODF record parsing namespace unaware ( #6629 )
2021-04-23 17:09:36 +02:00
Miriam Baglioni
72e5aa3b42
refactoring
2021-04-23 12:10:30 +02:00
Miriam Baglioni
7d1b8b7f64
merge upstream
2021-04-23 11:55:49 +02:00
miconis
d0e3366c34
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-22 11:45:19 +02:00
miconis
3c12eeadce
bug fix in propagation of relations
2021-04-22 11:44:33 +02:00
Claudio Atzori
e5abbec2ba
[orcid] download of the lambda file defined in a script
2021-04-22 11:22:10 +02:00
Claudio Atzori
55964cbd81
[orcid] large oozie workflow cleanup; updated workflow for the orcidnodoi actionset creation
2021-04-22 10:18:09 +02:00
Claudio Atzori
8f309b72ff
[dedup] using node names consistently across the workflow
2021-04-21 17:54:51 +02:00
Claudio Atzori
52244f813a
merging from enrico.ottonello/dnet-hadoop:orcid-no-doi
2021-04-21 12:24:09 +02:00
Sandro La Bruzzo
fd29307b84
updated workflow name
2021-04-21 09:21:41 +02:00
Claudio Atzori
815b9f4d56
[openorgs dedup] fixed workflow parameter declarations. Introduced support for resuming the execution from intermediate steps
2021-04-20 17:24:45 +02:00
Claudio Atzori
d0d477cca3
code formatting
2021-04-20 12:50:34 +02:00
miconis
0393cdce42
addition of alternative names in export queries
2021-04-20 12:45:21 +02:00
miconis
cadd0a5de8
modification of the queries for openorgs: they now consider also pending orgs
2021-04-20 12:06:56 +02:00
Sandro La Bruzzo
e06c7f32f6
updated id figshare as described in #6377
2021-04-20 10:18:07 +02:00
Sandro La Bruzzo
dbe0d0378e
resolved ticket #6377
2021-04-20 09:44:44 +02:00
Sandro La Bruzzo
524e5f3092
Improved parallelization on transformation wf on hadoop
2021-04-19 15:17:25 +02:00
Sandro La Bruzzo
cdfe01bbae
improved parallelization on transformation job
2021-04-19 15:14:52 +02:00
Sandro La Bruzzo
3ae67b7a1d
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-16 17:36:57 +02:00
Sandro La Bruzzo
a16e5299f9
applied unique function on the final dataset
2021-04-16 17:36:48 +02:00
Claudio Atzori
45057440c1
code formatting
2021-04-16 17:28:25 +02:00
Enrico Ottonello
34ca792a55
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-04-16 17:18:46 +02:00
Enrico Ottonello
27068aacd1
wf to move orcid-no-doi dataset on the folder ready the import
2021-04-16 17:17:47 +02:00
miconis
7ad573d023
bug fix: changed join in propagaterelations without applying filter on the id
2021-04-16 16:40:42 +02:00
Sandro La Bruzzo
67085da305
fixed NPE
2021-04-16 11:05:58 +02:00
Sandro La Bruzzo
644aa8f40c
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-16 09:14:26 +02:00
Sandro La Bruzzo
7d6a80e2f2
added new type on MAG mapping
2021-04-16 09:14:15 +02:00
Claudio Atzori
906d50563c
Merge pull request 'properly invalidating impala metadata' ( #105 ) from antonis.lempesis/dnet-hadoop:master into master
...
Reviewed-on: #105
2021-04-15 15:06:22 +02:00
Claudio Atzori
3d58f95522
[stats update] properly invalidating impala metadata
2021-04-15 15:03:05 +02:00
Antonis Lempesis
03d36fadea
properly invalidating impala metadata
2021-04-15 13:34:22 +03:00
miconis
f64e57c112
refactoring of the id generation, sparkcreatemergerels collects entities to create root id after a join
2021-04-15 10:59:24 +02:00
miconis
176a5e493d
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-14 18:06:34 +02:00
miconis
3525a8f504
id generation of representative record moved to the SparkCreateMergeRel job
2021-04-14 18:06:07 +02:00
Sandro La Bruzzo
3f77bfceb0
fixed test failure on jenkins
2021-04-14 10:03:01 +02:00
Claudio Atzori
3125cef545
code formatting
2021-04-14 09:11:54 +02:00
Sandro La Bruzzo
44a0064df6
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-13 17:48:12 +02:00
Sandro La Bruzzo
479abd10cb
Add into ORCID workflow a method that extracts orcid directly to the dump generated by Enrico
2021-04-13 17:47:43 +02:00
Claudio Atzori
710cd1e8f2
Merge pull request 'add xslt, personname cleaner' ( #104 ) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: #104
LGTM
2021-04-13 14:43:05 +02:00
Claudio Atzori
d1ca025b0b
[cleaning] remiving authors without fullname or providing 'deactivated' keyword. Removing test test titles
2021-04-13 14:32:41 +02:00
miconis
1542196a33
bug fix: starting node of duplicate scan wf changed
2021-04-13 10:15:43 +02:00
miconis
369ed1cd8a
bug fix: lookupurl parameter added to dedup record job
2021-04-13 09:08:05 +02:00
Andreas Czerniak
3b694074ff
add xslt, personname cleaner
2021-04-13 07:04:27 +02:00
Claudio Atzori
511c0521e5
[dedup] avoiding NPEs handling OpenOrg relations
2021-04-12 17:45:11 +02:00
miconis
d442e25cbc
bug fix: ids in self mergerels are not marked deletedbyinference=true
2021-04-12 15:56:22 +02:00
miconis
dcff9cecdf
bug fix: ids in self mergerels are not marked deletedbyinference=true
2021-04-12 15:55:27 +02:00
miconis
11b22b2d23
bug fix in the query, it now exports only relations with non-hidden organizations
2021-04-08 11:51:47 +02:00
miconis
0857100fb8
implementation of the tests for the openorgs integration in the openaire provision
2021-04-07 18:42:16 +02:00
miconis
bf685d849f
addition of pids in the query for the export of openorgs for the provision, addition of ec_fields in the openorgs model
2021-04-07 14:27:43 +02:00
Miriam Baglioni
70e391d427
merge upstream
2021-04-07 10:38:08 +02:00
miconis
eaaefb8b4c
implementation of the procedure to reuse content of different dbs when creating the raw graph
2021-04-06 14:35:51 +02:00
miconis
c39c82dfe9
modification of the jobs for the integration of openorgs in the provision, dedup records are no more created by merging but simply taking results of openorgs portal
2021-04-06 14:31:00 +02:00
Claudio Atzori
37b65cc3ad
Merge pull request 'updates on stats-update workflow' ( #100 ) from antonis.lempesis/dnet-hadoop:master into master
...
The workflow integrated in the _stable_ids_ branch has been run correctly on the BETA content, thus IMO this PR can be integrated in the master branch.
Reviewed-on: #100
2021-04-02 16:13:35 +02:00
Claudio Atzori
1e7e5180fa
[Graph model] updated definition of ExternalReference: added alternateLabel, removed description ( #6503 )
2021-04-02 12:32:12 +02:00
Claudio Atzori
e686b8de8d
[ORCID-no-doi] integrating PR#98 #98
2021-04-01 17:11:03 +02:00