Miriam Baglioni
100e54e6c8
mergin with branch beta
2021-08-03 10:47:11 +02:00
Miriam Baglioni
461b8a29a0
removed not needed class
2021-08-03 10:46:51 +02:00
Miriam Baglioni
327cddde33
Hosted By Map - refactoring
2021-08-03 10:44:13 +02:00
Miriam Baglioni
17292c6641
Hosted By Map - resources for testing purposes
2021-08-02 19:37:08 +02:00
Miriam Baglioni
ee7ccb98dc
Hosted By Map - test class to verify the application of the hbm to results and datasource
2021-08-02 19:36:18 +02:00
Miriam Baglioni
90e91486e2
Hosted By Map - test class to verify each step in the preparation process
2021-08-02 19:35:52 +02:00
Miriam Baglioni
1e859706a3
Hosted By Map - Classes to apply the HBM to results and datasources
2021-08-02 19:35:23 +02:00
Miriam Baglioni
72df8f9232
Hosted By Map - removed the aggregator for the datasource (it is no more needed) and added a new aggregator for the results. Changed also the hostedBYMap aggregator
2021-08-02 19:34:44 +02:00
Miriam Baglioni
ff1ce75e33
Hosted By Map - modification in the code to prepare the info needed to apply the HostedByMap. There is no need to join datasources with the hbm: all the information needed is in the hosted by map already
2021-08-02 19:32:59 +02:00
Claudio Atzori
e826aae848
using constants from ModelConstants
2021-08-02 14:28:59 +02:00
Antonis Lempesis
117c3d5c67
fixed a typo
2021-08-02 12:15:58 +03:00
Miriam Baglioni
1695d45bd4
Hosted By Map - Test class to verify the preparation of the intermediate information
2021-07-30 17:57:01 +02:00
Miriam Baglioni
7c6ea2f4c7
Hosted By Map - first attempt for the creation of intermedia information to be used to applu the hosted by map on the graph entities
2021-07-30 17:56:27 +02:00
Miriam Baglioni
d8b9b0553b
Hosted By Map - model classes to store the intermediate information to be used to apply the hosted by map
2021-07-30 17:55:39 +02:00
Miriam Baglioni
613bd3bde0
Hosted By Map - refactor of the first attemp to prepare a new hosted by map dependent on the datasource in the graph and on two external sources: the gold list from unibi ad the doaj list of open access journal. Both the lists are downloaded from provided url parameter
2021-07-30 17:54:45 +02:00
Miriam Baglioni
d1807781c0
mergin with branch beta
2021-07-30 14:34:07 +02:00
Miriam Baglioni
1d6ac3715b
merge branch with beta
2021-07-30 11:58:29 +02:00
Claudio Atzori
19620eed46
applying PR#131, Patch the identifiers (source/target) in the relations, refinements
2021-07-30 11:09:32 +02:00
Claudio Atzori
4f78565c04
fixed implementation of PatchRelationsApplication, refined the relative unit test
2021-07-30 11:07:09 +02:00
Claudio Atzori
a6a38cca9e
fixed implementation of PatchRelationsApplication, refined the relative unit test
2021-07-30 11:06:11 +02:00
Miriam Baglioni
9bc4fd3b69
Patch FCT relations - fixed issue with join
2021-07-30 10:34:05 +02:00
Miriam Baglioni
2fc89fc9b5
Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement
2021-07-30 10:20:43 +02:00
Claudio Atzori
081fe92a21
Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement
2021-07-30 10:13:56 +02:00
Claudio Atzori
576693d782
added unit test for PatchRelationsApplication
2021-07-30 10:13:33 +02:00
Claudio Atzori
55e6470f44
Merge pull request 'added the sprint 2 indicators in monitor db' ( #129 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #129
2021-07-30 10:11:46 +02:00
Sandro La Bruzzo
6358f92c3a
added sleep to solve problem of lost request of creating index
2021-07-30 08:54:37 +02:00
Antonis Lempesis
26af0320d0
added the sprint 2 indicators in monitor db
2021-07-30 00:31:33 +03:00
Claudio Atzori
7b172e7cd9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-07-29 13:57:06 +02:00
Claudio Atzori
c53d106e80
[provision] lowercase relation filter
2021-07-29 13:57:00 +02:00
Claudio Atzori
6e3554a45e
[provision] lowercase relation filter
2021-07-29 13:56:37 +02:00
Sandro La Bruzzo
b1b0cc3f15
fixed wrong package name
2021-07-29 13:55:08 +02:00
Miriam Baglioni
baad01cadc
hostedbymap
2021-07-29 13:04:39 +02:00
Claudio Atzori
e725c88ebb
[raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations
2021-07-29 13:03:43 +02:00
Claudio Atzori
5d08ad86ae
[raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations
2021-07-29 13:03:16 +02:00
Claudio Atzori
e87e1805c4
[raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset
2021-07-29 12:13:06 +02:00
Claudio Atzori
5f7330d407
Merge branch 'master' into fct_project_id_replacement
2021-07-29 11:38:22 +02:00
Claudio Atzori
1923c1ce21
replaced full join + filtering with a left join
2021-07-29 11:36:20 +02:00
Claudio Atzori
dc55ed4acd
Merge pull request '[beta] stats update workflow' ( #128 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #128
2021-07-29 11:13:21 +02:00
Claudio Atzori
908f57a475
code formatting
2021-07-29 10:49:39 +02:00
Sandro La Bruzzo
3721df7aa6
refactoring create actionset of scholexplorer, moved on package dhp-aggregation
2021-07-29 10:45:35 +02:00
Antonis Lempesis
4afa5215a9
fixed a NPE?
2021-07-28 21:59:12 +03:00
Antonis Lempesis
3d1580fa9b
fixed a typo
2021-07-28 18:50:31 +03:00
Claudio Atzori
4c5a71ba2f
[broker] updated relation descriptors, making use of constant values
2021-07-28 17:11:18 +02:00
Claudio Atzori
a9961a1835
[cleaning] title cleaning based on the me.xuender:unidecode library
2021-07-28 16:36:33 +02:00
Claudio Atzori
e1797c0a42
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-07-28 16:21:36 +02:00
Claudio Atzori
6dddad86ee
[cleaning] title cleaning based on the me.xuender:unidecode library
2021-07-28 16:21:29 +02:00
Sandro La Bruzzo
3d8f0f629b
implemented workflow of creation action set for scholexplorer
2021-07-28 16:15:34 +02:00
Antonis Lempesis
9b181ffa73
added the h2020 classification scheme for projects
2021-07-28 16:31:29 +03:00
Alessia Bardi
df8715a1ec
format code after mvn compile
2021-07-28 11:58:26 +02:00
Michele Artini
3e2a2d6e71
added new fields in xml
2021-07-28 11:56:55 +02:00
Alessia Bardi
c806387d4b
tests for enermaps
2021-07-28 11:54:36 +02:00
Alessia Bardi
9594343725
code formatting after mvn compile
2021-07-28 11:41:34 +02:00
Claudio Atzori
2fff24df55
code formatting
2021-07-28 11:34:19 +02:00
Michele Artini
9f1c7b8e17
tests
2021-07-28 11:32:34 +02:00
Antonis Lempesis
4a9741825d
added result_orcid, result_project provenance, issn in datasources
2021-07-28 12:28:04 +03:00
Miriam Baglioni
3d2bba3d5d
removing not needed classes
2021-07-28 11:25:43 +02:00
Miriam Baglioni
cc0d3d8a7b
mergin with branch beta
2021-07-28 11:24:46 +02:00
Michele Artini
e6f1773d63
mapping of new eosc fields
2021-07-28 11:17:11 +02:00
Miriam Baglioni
80d5b3b4de
DoiBoost AccessRigh #4362 - removing commented code
2021-07-28 11:16:49 +02:00
Miriam Baglioni
5fe016dcbc
DoiBoost AccessRigh #4362 - related to https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/126/files#issuecomment-4194
2021-07-28 11:14:28 +02:00
Miriam Baglioni
73ed7374a9
mergin with branch beta
2021-07-28 11:05:16 +02:00
Miriam Baglioni
43e62fcae9
DoiBoost AccessRigh #4362 - related to https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/126/files#issuecomment-4193
2021-07-28 11:04:55 +02:00
Michele Artini
c72c960ffb
added eosc fields
2021-07-28 11:03:15 +02:00
Michele Artini
1fb572a33a
added eosc fields
2021-07-28 10:52:24 +02:00
Miriam Baglioni
708d0ade34
Merge branch 'beta' into hostedbymap
2021-07-28 10:37:22 +02:00
Sandro La Bruzzo
16c91203bd
implemented workflow of creation action set for scholexplorer
2021-07-28 10:30:49 +02:00
Miriam Baglioni
6c936943aa
mergin with branch beta
2021-07-28 10:24:48 +02:00
Miriam Baglioni
0424f47494
HostedByMap fixing issues
2021-07-28 10:24:13 +02:00
Michele Artini
52e2315ba2
removed trick for datasourcetypeui
2021-07-28 10:23:00 +02:00
Claudio Atzori
d267dce520
[raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset
2021-07-27 17:18:29 +02:00
Sandro La Bruzzo
825d9f0289
fixed datacite workflow starting from Importing delta
2021-07-27 16:09:46 +02:00
Claudio Atzori
5aa7d16d1b
updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest
2021-07-27 15:11:58 +02:00
Claudio Atzori
998b66855a
updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest
2021-07-27 15:11:37 +02:00
Antonis Lempesis
1a28a69cac
changed the citeee in *_citations to cites
2021-07-27 15:14:09 +03:00
Miriam Baglioni
74f801b689
mergin with branch beta
2021-07-27 13:18:31 +02:00
Miriam Baglioni
35e395eae8
merge with master
2021-07-27 12:34:59 +02:00
Miriam Baglioni
eb07f7f40f
Hosted By Map
2021-07-27 12:27:26 +02:00
Antonis Lempesis
ed185fd7ed
added missing colons
2021-07-27 11:42:47 +03:00
Antonis Lempesis
f3b9570354
properly invalidating metadata
2021-07-26 13:00:16 +03:00
Sandro La Bruzzo
848aabbb6c
minor fix
2021-07-25 12:06:41 +02:00
Sandro La Bruzzo
8fac10c91e
fixed defintion wf of creation final infospace of scholexplorer
2021-07-25 11:15:37 +02:00
Sandro La Bruzzo
3920c69bc8
change implementation of resolve Relation to generate jsonRdd in output
2021-07-25 09:51:36 +02:00
Antonis Lempesis
f9fbb0f261
added indicators second sprint
2021-07-24 16:40:28 +03:00
Claudio Atzori
a0393607a7
mapping funding relations from Datacite should be done according to the actual result identifier
2021-07-23 18:15:08 +02:00
Claudio Atzori
5b6844b969
mapping funding relations from Datacite should be done according to the actual result identifier
2021-07-23 18:14:37 +02:00
Sandro La Bruzzo
d9e3b89937
implemented last part of workflows to generate scholixGraph
2021-07-23 16:38:32 +02:00
Sandro La Bruzzo
cfde63a7c3
fixed resolve relation join
2021-07-23 14:17:29 +02:00
Sandro La Bruzzo
4a439c3863
NPE fixed
2021-07-23 14:17:29 +02:00
Sandro La Bruzzo
ca74e8dd02
create a separate wf for resolving relation
2021-07-23 11:40:06 +02:00
Sandro La Bruzzo
43e9380cd3
update resolve relation to use the same format of openaire graph
2021-07-23 11:25:18 +02:00
Sandro La Bruzzo
058b636d4d
added control to check if the entity exists
2021-07-22 16:08:54 +02:00
Sandro La Bruzzo
62ae36a3d2
fixed NPE
2021-07-22 15:41:38 +02:00
Miriam Baglioni
63553a76b3
added code to download gold issn list from unibi
2021-07-22 12:01:48 +02:00
Miriam Baglioni
1a5b114906
DoiBoost AccessRigh #4362 - refactoring
2021-07-22 12:00:23 +02:00
Sandro La Bruzzo
31d2d6d41e
Scholexplorer: introduction of dedup openaire
2021-07-21 18:09:32 +02:00
Miriam Baglioni
b226ba4439
mergin with branch beta
2021-07-21 09:46:40 +02:00
Alessia Bardi
9069958479
tests for enermaps
2021-07-20 19:31:43 +02:00
Claudio Atzori
10d7b4f0b4
filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure
2021-07-20 11:52:05 +02:00
Claudio Atzori
77e8c6c7f7
filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure
2021-07-20 11:51:33 +02:00
Miriam Baglioni
83fe31c92e
changed the name of the workflows
2021-07-19 18:19:14 +02:00
Miriam Baglioni
dd81c36b60
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-07-19 18:18:14 +02:00
Miriam Baglioni
54acc5373b
changed the name of the workflows
2021-07-19 18:18:09 +02:00
Miriam Baglioni
b420b11ed3
duplicate the number of partitions in ProcessMag
2021-07-19 18:16:23 +02:00
Claudio Atzori
65934888a1
adding record identifier among the originalIds regardless of what IdentifierFactory produces
2021-07-19 17:52:52 +02:00
Claudio Atzori
5947cddafc
adding record identifier among the originalIds regardless of what IdentifierFactory produces
2021-07-19 17:52:24 +02:00
Claudio Atzori
0977baf41d
contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph
2021-07-19 17:43:52 +02:00
Claudio Atzori
5e5f65a3c3
contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph
2021-07-19 15:56:55 +02:00
Miriam Baglioni
662c396354
duplicate the number of partitions in ConvertCrossrefToOaf
2021-07-19 12:41:14 +02:00
Miriam Baglioni
59530a14fb
DoiBoost AccessRigh #4362 - set BestAccessRight with the ususal comparator
2021-07-19 12:34:35 +02:00
Miriam Baglioni
199123b74b
DoiBoost AccessRigh #4362 - Fixed issue on date formatting. Added test method and associated resource
2021-07-16 17:30:27 +02:00
Miriam Baglioni
c4b18e6ccb
changed the download.sh, added skip step to allow to not execute one phase and changed the workflow sequence of steps
2021-07-16 15:01:25 +02:00
Miriam Baglioni
acd6056330
added shell action to automatically download the new dump and put it in a specified hdfs location
2021-07-16 12:47:10 +02:00
Miriam Baglioni
3bc9a05bc9
mergin with branch beta
2021-07-16 10:32:27 +02:00
Miriam Baglioni
34506df1b6
DoiBoost AccessRigh #4362 - if the journal is open, the OPEN access right is set to all instances and color is GOLD (overwrite if the color was already set in one of the previous steps)
2021-07-16 10:29:51 +02:00
Claudio Atzori
bf9e0d2d4f
Merge pull request 'orcid-no-doi' ( #123 ) from enrico.ottonello/dnet-hadoop:orcid-no-doi into beta
...
Reviewed-on: #123
2021-07-15 17:59:41 +02:00
Sandro La Bruzzo
7e2caafe84
Scholexplorer: fixed mapping typologies
2021-07-15 09:53:12 +02:00
Enrico Ottonello
2dc50c0999
added default value to process path
2021-07-14 17:02:22 +02:00
Enrico Ottonello
66604bb2b4
added absolute path to process folder
2021-07-14 16:44:51 +02:00
Enrico Ottonello
7840cc6526
merged with master
2021-07-14 15:33:59 +02:00
Miriam Baglioni
4da46bb62f
mergin with branch beta
2021-07-14 15:08:52 +02:00
Enrico Ottonello
a65667d217
added publication to dataset even if no contributors
2021-07-14 15:07:07 +02:00
Sandro La Bruzzo
10068c00ea
Code refactor:
...
- removed old workflows in doiboost
- splitted workflow of doiboost in preprocess and process
2021-07-14 14:45:50 +02:00
Miriam Baglioni
09ad7b2a9e
DoiBoost AccessRigh #4362 - Unpaywall mapped to OAF with OPEN instance (non oa are filtered out) (unknown hostedby) + map the color as it is
2021-07-14 14:45:21 +02:00
Miriam Baglioni
f4f7c6f9d3
DoiBoost AccessRigh #4362 - Unpaywall mapped to OAF with OPEN instance (non oa are filtered out) (unknown hostedby) + map the color as it is
2021-07-14 14:44:54 +02:00
Miriam Baglioni
6222adf176
DoiBoost AccessRigh #4362 - added resources and test for crossref mapping (licence part included)
2021-07-14 14:42:34 +02:00
Miriam Baglioni
981b1018f6
DoiBoost AccessRigh #4362 - decide access right according to licence. Default access right is Unknown
2021-07-14 14:42:06 +02:00
Sandro La Bruzzo
3d8e2aa146
Code refactor:
...
- removed old workflows in doiboost
- splitted workflow of doiboost in preprocess and process
2021-07-14 14:37:06 +02:00
Miriam Baglioni
441701c85c
DoiBoost AccessRigh #4362 - If multiple licenses are available, take the one applied to 'vor'
2021-07-14 14:14:50 +02:00
Sandro La Bruzzo
c35c117601
fixed process doiboost workflow:
...
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 12:48:01 +02:00
Miriam Baglioni
1cdd09cd8e
Tentative fix for testing of Jenkins
2021-07-14 11:14:59 +02:00
Sandro La Bruzzo
4cb65bc64a
fixed process doiboost workflow:
...
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 09:44:32 +02:00
Miriam Baglioni
774cdb190e
changes to mirror the last dump of the graph with the ols data model.
2021-07-13 18:57:24 +02:00
Miriam Baglioni
886617afd0
One result linked to more than on project is saved just once
2021-07-13 18:15:35 +02:00
Miriam Baglioni
320cf02d96
Changed the way to find results linked to projects. We verify to actually have the project on the graph before selecting the result
2021-07-13 18:13:32 +02:00
Miriam Baglioni
52ce35d57b
-
2021-07-13 18:08:46 +02:00
Miriam Baglioni
970b387b8d
modification to allow dump of a single community
2021-07-13 18:08:10 +02:00
Miriam Baglioni
eae10c5894
modification to allow the dump for a single community
2021-07-13 18:07:25 +02:00
Miriam Baglioni
c028feef4f
workflow for the dump as sub workflows
2021-07-13 18:06:44 +02:00
Miriam Baglioni
d70f8c96fd
funding contains and not starts with h2020
2021-07-13 17:34:53 +02:00
Miriam Baglioni
5e38c7f42d
dumping only communities with status all
2021-07-13 17:32:38 +02:00
Claudio Atzori
734de62474
[doiboost] added workflow for the ActionSet update dedicated to production
2021-07-13 17:26:04 +02:00
Miriam Baglioni
618d2de2da
minor changes and refactoring
2021-07-13 17:10:02 +02:00
Miriam Baglioni
59615da65e
Add test to verify the creation of relation between context and projects
2021-07-13 17:09:15 +02:00
Miriam Baglioni
084b4ef999
added the creation of the openaireId from funder and grant number if the element is not present in the context profile
2021-07-13 17:07:46 +02:00
Claudio Atzori
fa720c1da4
[doiboost] added workflow for the ActionSet update dedicated to production
2021-07-13 16:59:30 +02:00
Miriam Baglioni
8f322a73cb
change because of the renaming of originalId in acronym
2021-07-13 16:22:58 +02:00
Miriam Baglioni
72397ea1ba
Added fix for community of arbitrary name length
2021-07-13 16:18:35 +02:00
Miriam Baglioni
5295d10691
added check not to dump deletedByInference entities
2021-07-13 16:11:46 +02:00
Claudio Atzori
9629569e22
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-07-13 16:04:08 +02:00
Claudio Atzori
f13e11e3f7
[aggregation] datacite wf: defined parameter declaring the path used to store the OAF objects produced by the transformation phase
2021-07-13 16:04:02 +02:00
Miriam Baglioni
e9a17ec899
added check to verify not to add void APC
2021-07-13 15:53:35 +02:00
Miriam Baglioni
8429aed6c6
Added resource for testing selection of valid relations
2021-07-13 15:49:38 +02:00
Miriam Baglioni
39b1a6edf6
added test class for the selection of valid relations and description
2021-07-13 15:23:09 +02:00
Miriam Baglioni
9a58f1b93d
added logic to select only the valid relations: those not deletedbyinference and having both part of the relation as entities in the graph
2021-07-13 15:20:39 +02:00
Miriam Baglioni
13c66e16be
changed logic to split for communities
2021-07-13 15:15:27 +02:00
Miriam Baglioni
6410ab71d8
added APC in the dump and test method
2021-07-13 15:13:58 +02:00
Miriam Baglioni
65a242646d
added resource for APC dump
2021-07-13 14:45:25 +02:00
Miriam Baglioni
4b432fbee8
extended test class
2021-07-13 14:40:39 +02:00
Miriam Baglioni
87a6e2b967
extended test class
2021-07-13 14:38:28 +02:00
Miriam Baglioni
69fd40fd30
modified code to split the Croatian funder
2021-07-13 14:35:26 +02:00
Miriam Baglioni
86e50f7311
modified code to split the Croatian funder
2021-07-13 14:31:45 +02:00
Miriam Baglioni
da88c850c6
changed the logic to verify if a community is contained in the list of context of a result
2021-07-13 14:22:44 +02:00
Miriam Baglioni
2f66fedfec
changed the logic to verify if a community is contained in the list of context of a result
2021-07-13 14:22:23 +02:00
Miriam Baglioni
f5486ffb14
Fixed issues to tests
2021-07-13 14:07:45 +02:00
Claudio Atzori
e0061232e9
[aggregation] datacite wf: conditional creation of links, optional resume from intermediate phases
2021-07-13 13:41:21 +02:00
Sandro La Bruzzo
bbe8193930
merged stable ids
2021-07-12 17:00:43 +02:00
Claudio Atzori
ae2b47b29d
[broker] added coalesce(1) on the stats dataset before storing it on postgres
2021-07-09 15:47:51 +02:00
Sandro La Bruzzo
57c74c73c6
fixed mistakes in oozie workflow
2021-07-09 12:28:09 +02:00
Sandro La Bruzzo
61ccb54fde
removed wrong loop on oozie wf
2021-07-09 12:17:57 +02:00
Sandro La Bruzzo
9f5a0f3ab6
moved wf indexing of Scholexplorer in dhp-graph-provision
2021-07-09 12:06:43 +02:00
Sandro La Bruzzo
09fccf8000
added workflow to serialize scholix and summary in json
2021-07-09 11:01:42 +02:00
Sandro La Bruzzo
0ea576745f
updated CreateInputGraph because ggenerics don't work on Spark Dataset
2021-07-09 10:29:24 +02:00
Sandro La Bruzzo
cd17e19044
implemented branch workflow to import datacite and crossref in scholexplorer
2021-07-08 21:20:19 +02:00
Miriam Baglioni
c30f3ce647
merge doi normalization
2021-07-08 19:20:02 +02:00
Sandro La Bruzzo
8a034e46e1
updated baseline workflow
2021-07-08 11:11:41 +02:00
Claudio Atzori
b7b8e0986e
[raw_all] The claim merge procedure includes the claimed contexts in the merged result
2021-07-08 10:42:31 +02:00
Sandro La Bruzzo
0799ac9fb6
fixed wrong path
2021-07-08 10:36:37 +02:00
Sandro La Bruzzo
4d53402712
extended ebiLinks to create a dataset before generation of OAF
2021-07-08 10:26:21 +02:00
Sandro La Bruzzo
a4a54a3786
code refactor
2021-07-08 09:08:25 +02:00
Sandro La Bruzzo
a01dbe0ab0
completed workflow of generation of scholix and summaries
2021-07-07 23:10:34 +02:00
Claudio Atzori
fdcff42e46
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-07-07 19:01:59 +02:00
Claudio Atzori
777536ce91
[aggregation] string values used as regular expressions in the OAI collection classes are defined in a single point as constants, to be reused across the code (PR#122)
2021-07-07 11:23:48 +02:00
Claudio Atzori
bc014023c8
Merge pull request 'to solve the scala SI-3623' ( #122 ) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: #122
2021-07-07 11:13:51 +02:00
Claudio Atzori
32bdfdccbc
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-07-07 11:08:27 +02:00
Andreas Czerniak
ebf3f47a02
from&until more OAI2.0 compl., adding tfs
2021-07-07 09:29:49 +02:00
Claudio Atzori
f580cb77e1
added mapping for claim relation 'resultResult_publicationDataset_isRelatedTo' (present on BETA)
2021-07-06 21:11:11 +02:00
Sandro La Bruzzo
ed684874f2
deleted old scholix project
2021-07-06 17:20:08 +02:00
Sandro La Bruzzo
8535506c22
added scholix generation
2021-07-06 17:18:06 +02:00
Sandro La Bruzzo
4c54bd8742
add test to verify merge scholix on source
2021-07-06 11:32:14 +02:00
Andreas Czerniak
3531802710
to solve the scala SI-3623
2021-07-06 11:30:56 +02:00
Sandro La Bruzzo
7d8db2eb8a
betterRenamingMethod
2021-07-06 09:56:32 +02:00
Sandro La Bruzzo
c952c8d236
generate first side of scholix mapping
2021-07-06 09:53:14 +02:00
Claudio Atzori
70ded407bb
HttpClient used in metadata collection retries also on 404
2021-07-05 18:04:30 +02:00
Miriam Baglioni
7177c25261
added check for null value during doi normalization
2021-07-05 16:22:38 +02:00
Miriam Baglioni
0892cad4e8
the normalization of the content of value was not visible outside the block. Moved doi normalization operation while returning value
2021-07-05 16:21:42 +02:00
Antonis Lempesis
89e6f46682
using organization ids instead of names in monitor db creation
2021-07-05 12:00:00 +03:00
Sandro La Bruzzo
e4b84ef5d6
fixed mapping OAF to Scholix summary
2021-07-02 16:48:48 +02:00
Sandro La Bruzzo
c6fa8598e1
massive code refactor:
...
removed modules dhp-*-scholexplorer
2021-07-01 22:13:45 +02:00
Antonis Lempesis
829caee4fd
added the missing indicators files
2021-06-30 17:31:33 +02:00
Sandro La Bruzzo
84b834c893
added test dataset test for pangaea
2021-06-30 17:31:09 +02:00
Sandro La Bruzzo
1a6b398968
implemented Creation of Raw Graph and Resolution
2021-06-30 17:27:55 +02:00
Miriam Baglioni
bc34347643
added assertions to verify doi normalization
2021-06-30 14:37:08 +02:00
Miriam Baglioni
86f47afcc7
slight modification of the resource to accomodate also doi normalization tests
2021-06-30 14:36:49 +02:00
Miriam Baglioni
03767ea8e6
slight modification of the resource to accomodate also doi normalization tests
2021-06-30 13:21:24 +02:00
Miriam Baglioni
f8eec0ca9a
added resource to test the normalization of doi during the import of MAG
2021-06-30 13:19:54 +02:00
Miriam Baglioni
149f85ddf5
added tests for the normalization of the dois
2021-06-30 13:00:52 +02:00
Miriam Baglioni
e487b5544c
added tests for the normalization of the dois
2021-06-30 12:57:11 +02:00
Miriam Baglioni
1503ccbbb5
added tests for the normalization of the dois
2021-06-30 12:55:37 +02:00
Miriam Baglioni
1299bfb357
Added class to test the normalization of doi
2021-06-30 12:53:27 +02:00
Sandro La Bruzzo
623a0c4edb
code Refactor, renaming packages
2021-06-30 11:09:30 +02:00
Miriam Baglioni
cf758f4f91
added normalization step for the doi
2021-06-30 10:03:15 +02:00
Miriam Baglioni
801763a0fa
there is no more the need to lower case the doi since it is done in the first step. Also changed the creation of the id by using the factory
2021-06-29 19:07:23 +02:00
Miriam Baglioni
a74de1cda2
added normalization step to the doi
2021-06-29 18:51:11 +02:00
Miriam Baglioni
06074ea7d3
added normalization step to the doi
2021-06-29 18:46:08 +02:00
Miriam Baglioni
8b8ffe82dc
added step of normalization for the doi
2021-06-29 18:41:39 +02:00
Miriam Baglioni
50cc21d92e
Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.)
2021-06-29 18:35:28 +02:00
Antonis Lempesis
87f14a3899
added the missing indicators files
2021-06-29 16:31:51 +03:00
Sandro La Bruzzo
db933ebd21
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-29 14:16:12 +02:00
Sandro La Bruzzo
7e08655e5f
added relation dates in all scholexplorer Datasources
2021-06-29 12:02:03 +02:00
Sandro La Bruzzo
075055eaca
added relation dates in bio mapping
2021-06-29 10:33:09 +02:00
Sandro La Bruzzo
f36f92287d
implemented mapping from Crossref Event Data to Oaf
2021-06-29 10:21:23 +02:00
Antonis Lempesis
018c4eb52c
copied latest changes from old fork: indicators+monitor institutions
2021-06-28 23:46:52 +03:00
Sandro La Bruzzo
511ec14c63
implemented mapping from EBI and Scholix Resolved to OAF
2021-06-28 22:04:22 +02:00
Claudio Atzori
af42377d0e
HttpClient used in metadata collection retries on 502, 503, 504
2021-06-28 09:34:30 +02:00
Sandro La Bruzzo
ad50415167
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-24 17:20:50 +02:00
Sandro La Bruzzo
80e15cc455
implemented mapping from uniprot, pdb and ebi links
2021-06-24 17:20:00 +02:00
Claudio Atzori
2e8fd2c531
cleanup
2021-06-23 14:38:24 +02:00
Claudio Atzori
4dc9ebf217
[raw_all] fixed unit test
2021-06-23 14:38:07 +02:00
Claudio Atzori
50fc5a64a0
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-06-23 11:49:42 +02:00
Claudio Atzori
5edcc6832a
applying sonarLint suggestions
2021-06-23 09:53:29 +02:00
Sandro La Bruzzo
080a280bea
added pdb to Oaf Transformation
2021-06-21 16:23:59 +02:00
Sandro La Bruzzo
1dc0c59e20
merged fix thai dates from stable_ids
2021-06-21 10:39:46 +02:00
Sandro La Bruzzo
dc66cf615b
Merge branch 'stable_id_scholexplorer' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer
2021-06-21 09:38:33 +02:00
Sandro La Bruzzo
507e42102a
added pdb to oaf class
2021-06-21 09:36:40 +02:00
Sandro La Bruzzo
a167543637
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer
2021-06-21 09:14:11 +02:00
Sandro La Bruzzo
4fe7b75644
renamed packages
2021-06-18 16:41:24 +02:00
Sandro La Bruzzo
3990165d05
changed typologies of unresolved relation
2021-06-18 11:43:59 +02:00
Miriam Baglioni
180d671127
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-18 09:46:18 +02:00
Miriam Baglioni
13c96622c9
-
2021-06-18 09:45:16 +02:00
Miriam Baglioni
b486ae498f
added test and test resource to verify the generation of the date of acceptance from the input extracted from the dump
2021-06-18 09:43:32 +02:00
Miriam Baglioni
464c2ddde3
changed to split in two steps the generation of the crossref dataset
2021-06-18 09:42:31 +02:00
Miriam Baglioni
6aca0d8ebb
added kryo encoding for input files
2021-06-18 09:42:07 +02:00
Miriam Baglioni
3585e53da3
changed to split in two steps the generation of the crossref dataset
2021-06-18 09:41:23 +02:00
Claudio Atzori
41b551562e
applying PR#115 (DatePicker) on stable_ids
2021-06-17 09:33:50 +02:00
Sandro La Bruzzo
3100166d29
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-16 16:22:16 +02:00
Claudio Atzori
74833d04f1
Merge branch 'pids_beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into stable_ids
2021-06-16 15:54:18 +02:00
Claudio Atzori
7243a40c88
code formatting
2021-06-16 15:03:03 +02:00
Sandro La Bruzzo
dfcf78cf24
removed wrong code
2021-06-16 14:57:42 +02:00
Sandro La Bruzzo
cc0f2b11fb
Implemented mapping from pubmed baseline to OAF
2021-06-16 14:56:24 +02:00
Miriam Baglioni
95885bcf12
forces executor Executor memory and driver executor memory to be 7G (trying to avoid OOM)
2021-06-16 10:17:52 +02:00
Miriam Baglioni
2550a73981
-
2021-06-16 10:04:41 +02:00
Miriam Baglioni
1c47c0d786
modified the number of executors trying to avoid OOM exception
2021-06-15 21:05:39 +02:00
Miriam Baglioni
7deac55138
added one option for resume from in the wf
2021-06-15 18:38:20 +02:00
Antonis Lempesis
f7c0b80e35
storing result_instance as parquet
2021-06-15 14:45:48 +03:00
Miriam Baglioni
66e7ef892f
changed the parameter name
2021-06-15 11:08:54 +02:00
Miriam Baglioni
4f47ad0891
no need to rename the folders, just write in overwrite mode, so I changed the name of the output folder
2021-06-15 09:28:31 +02:00
Miriam Baglioni
9f9dd00b94
refactoring
2021-06-15 09:24:46 +02:00
Miriam Baglioni
63d74ee379
refactoring
2021-06-15 09:24:11 +02:00
Miriam Baglioni
6ebc236657
added needed property: outputPath
2021-06-15 09:23:24 +02:00
Miriam Baglioni
f7379255b6
changed the workflow to extract info from the dump
2021-06-15 09:22:54 +02:00
Miriam Baglioni
d6e21bb6ea
creates the crossref dataset used for doiboost together with unpacking part from tar
2021-06-14 17:27:19 +02:00
Miriam Baglioni
4da141bd7c
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-14 13:41:02 +02:00
Miriam Baglioni
ce0cfd79e0
creates the crossref dataset used for doiboost
2021-06-14 13:40:19 +02:00
Miriam Baglioni
93efe4de82
split the construction of crossref dataset in two parts. This one just unpacks the tar entries
2021-06-14 13:39:40 +02:00
Michele Artini
ada063ce70
fixed a problem with empty mdstore list (2)
2021-06-14 12:04:47 +02:00
Michele Artini
83132ee99a
fixed a problem with empty mdstore list
2021-06-14 11:57:00 +02:00
Miriam Baglioni
cf360d7c97
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-14 10:19:49 +02:00
Miriam Baglioni
8873e6b6d1
workflow and parameter
2021-06-14 10:15:57 +02:00
Miriam Baglioni
0f1acdf6b6
workflow and parameter
2021-06-14 10:08:55 +02:00
Sandro La Bruzzo
aeb8132627
Merged branch stable_ids
2021-06-14 10:07:29 +02:00
Sandro La Bruzzo
efbea1e01a
minor fix
2021-06-14 09:45:14 +02:00
Miriam Baglioni
75780fc636
extraction of the tar for the dump of crossref, and creation of the dataset
2021-06-14 09:45:07 +02:00
Claudio Atzori
2039bb9f5f
orcid / orcid_pending cleaning backported from master branch
2021-06-14 09:40:50 +02:00
Claudio Atzori
dd19c4ac5a
Merge pull request 'import_new_mdstores' ( #112 ) from import_new_mdstores into stable_ids
...
Reviewed-on: #112
2021-06-14 09:23:55 +02:00
Claudio Atzori
e9e86a237d
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-11 17:00:02 +02:00
Claudio Atzori
a900bfb874
delegating the date parsing to https://github.com/sisyphsu/dateparser
2021-06-11 16:53:01 +02:00
Sandro La Bruzzo
dd997c49e0
fix wrong relation id
...
fix date thai ticket #6791
2021-06-10 14:47:18 +02:00
Antonis Lempesis
d413b24611
added instances, orgs for monitor, totalcost for projects, apcs
2021-06-10 02:35:46 +03:00
Claudio Atzori
741077dbca
Merge pull request 'Fix in Affiliation Propagation' ( #113 ) from miriam.baglioni/dnet-hadoop:master into stable_ids
...
Reviewed-on: #113
2021-06-09 18:42:42 +02:00
Miriam Baglioni
32b0c27217
Aggiornare 'dhp-workflows/dhp-enrichment/src/main/java/eu/dnetlib/dhp/resulttoorganizationfrominstrepo/PrepareResultInstRepoAssociation.java'
...
fix in SQL query: while writing the blacklist constraint it used d.id to indicate the datasource id, but no alias for the datasource was defined. So I removed the alias
2021-06-09 18:36:11 +02:00
Sandro La Bruzzo
0d1f37302f
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer
2021-06-09 09:35:16 +02:00
Miriam Baglioni
dc07f1079b
added check in case the author set to be enriched is null
2021-06-08 12:06:10 +02:00
Miriam Baglioni
8d2e086e48
changes to avoid reassignment to val
2021-06-07 17:50:37 +02:00
Miriam Baglioni
f33521d338
Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
...
to be able to replace the aboject assigned to author val has been replaced by var
2021-06-07 17:27:07 +02:00
Miriam Baglioni
bc12e9819e
Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
...
The change is to fix the issue that arises when the same work appears more than once on the same ORCID profile. The change avoid to replicate the association doi -> author when the orcid id is already associated to the doi.
2021-06-07 16:37:01 +02:00
Sandro La Bruzzo
0cdb7ccdaa
added inverse relations to datacite mapping
2021-06-04 15:10:20 +02:00
Sandro La Bruzzo
5b724d9972
added relations to datacite mapping
2021-06-04 10:14:22 +02:00
Sandro La Bruzzo
e57294ac99
implemented changes on PUBMed dataflow
2021-06-03 10:52:09 +02:00
Michele Artini
ede2749822
orcid pid type
2021-06-01 12:42:43 +02:00
Michele Artini
f0fbfdcfae
Merge branch 'stable_ids' into import_new_mdstores
2021-06-01 12:03:00 +02:00
Michele Artini
e950750262
add nodes to import hdfs mdstores
2021-06-01 10:48:50 +02:00
Michele Artini
03a510859a
removed coalesce(1)
2021-05-31 14:10:51 +02:00
Michele Artini
e9f2b6037c
patch of mdstore records
2021-05-31 11:36:26 +02:00
Sandro La Bruzzo
02ef46535f
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-05-31 09:50:15 +02:00
Sandro La Bruzzo
aeadc5a366
updated wf Datacite Import to retrieve the block size as parameter
2021-05-31 09:49:53 +02:00
Claudio Atzori
96238152cb
added serialization for alternateIdentifiers and pids within each record instance
2021-05-28 16:57:30 +02:00
Michele Artini
ad56a44fda
save as gzipped sequence file
2021-05-28 14:45:39 +02:00
Claudio Atzori
83722ebc47
pull #111 replied on stable_ids
2021-05-28 14:11:46 +02:00
Claudio Atzori
6e3a4e9237
updated test expectations
2021-05-28 09:37:50 +02:00
Michele Artini
4fa5671d16
first implementation of Hdfs Mdstores Importer
2021-05-27 16:22:07 +02:00
Claudio Atzori
d512062b58
integrating pull #109 , H2020Classification
2021-05-27 12:22:47 +02:00
Claudio Atzori
5e4b91d9ef
more pervasive use of constants from ModelConstants, especially for ORCID
2021-05-26 18:20:23 +02:00
Sandro La Bruzzo
bced804151
updated wf Datacite Import to retrieve the block size as parameter
2021-05-26 17:06:50 +02:00
Miriam Baglioni
abd88f663d
changed test resource to mirror change in the input file
2021-05-21 15:20:47 +02:00
Miriam Baglioni
c844877de2
changed workflow flow to possibly parallelize also the programme and project preparation steps
2021-05-21 14:41:57 +02:00
Miriam Baglioni
073d76864d
refactoring
2021-05-21 14:41:03 +02:00
Miriam Baglioni
4c8b4a774c
removed not needed code
2021-05-21 14:40:07 +02:00
Enrico Ottonello
abdd0ade1f
added temporary output folder as workflow parameter
2021-05-21 12:08:16 +02:00
Miriam Baglioni
53b9d87fec
new prepareProgramme according to the new file
2021-05-21 11:49:31 +02:00
Miriam Baglioni
1ee8f13580
refactoring and added "left" as join type to be 100% sure to get the whole set of projects
2021-05-21 11:49:05 +02:00
Miriam Baglioni
e07c3ba089
due to change in the input file the filtering step is no more needed
2021-05-21 11:47:43 +02:00
Miriam Baglioni
54f6e2f693
changed to get the needed information to build the action set as parallel jobs
2021-05-21 11:47:00 +02:00
Miriam Baglioni
7180505519
removed non needed variable
2021-05-21 11:46:13 +02:00
Miriam Baglioni
2eb1a8b344
changed because the input file changed
2021-05-21 11:40:20 +02:00
Enrico Ottonello
d0945c3c78
added temporary output folder, because of folder access rights are different on beta and prod
2021-05-20 19:14:31 +02:00
Enrico Ottonello
1265dadc90
workflow aligned with stable_ids
2021-05-20 19:01:28 +02:00
Enrico Ottonello
0821d8e97d
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-05-20 18:33:18 +02:00
Enrico Ottonello
ae7bd24d79
removed old workflows
2021-05-20 18:32:22 +02:00
Claudio Atzori
9d725efdc1
reverted implementation of the mdstore client
2021-05-20 18:26:09 +02:00
Miriam Baglioni
9610224671
added param to workflow property
2021-05-20 18:21:12 +02:00
Claudio Atzori
863b56b6ce
using constants from ModelConstants
2021-05-20 16:23:58 +02:00
Claudio Atzori
ae5c28e54f
code formatting
2021-05-20 16:13:06 +02:00
Miriam Baglioni
aa45b4df9b
-
2021-05-20 15:57:40 +02:00
Miriam Baglioni
052c837843
-
2021-05-20 15:54:44 +02:00
Claudio Atzori
b695932ae4
integrated pull#108
2021-05-20 15:34:04 +02:00
Claudio Atzori
ea9b00ce56
adjusted test
2021-05-20 15:31:42 +02:00
Claudio Atzori
b572f56763
Merge branch 'master' into master
2021-05-20 15:22:35 +02:00
Claudio Atzori
2578b7fbb3
code formatting
2021-05-20 14:59:02 +02:00
Miriam Baglioni
dc0ad8d2e0
fixed issue related to change in the file name downloaded. Added sheet name as parameter and also a check if the name should change
2021-05-20 14:53:53 +02:00
Claudio Atzori
232dce83db
fixes #6701 : xpath for titles to support both datacite and Guidelines v4 mapping
2021-05-20 14:41:15 +02:00
Claudio Atzori
aef2977ad0
fixes #6701 : xpath for titles to support both datacite and Guidelines v4 mapping
2021-05-20 14:40:22 +02:00
Miriam Baglioni
02b80cf24f
resolved conflicts
2021-05-20 10:59:39 +02:00
Claudio Atzori
c4a23c2f4d
fix: preserving the old identifier among the originalIds in the doiboost construction process, trying to avoid UnsupportedOperationException while adding elements to the originalIds
2021-05-19 16:01:52 +02:00
Claudio Atzori
ba03f549d7
fix: preserving the old identifier among the originalIds in the doiboost construction process
2021-05-19 15:43:26 +02:00
Claudio Atzori
239d0f0a9a
ROR actionset import workflow backported from branch stable_ids
2021-05-18 16:12:11 +02:00
Antonis Lempesis
168edcbde3
added the final steps for the observatory promote wf and some cleanup
2021-05-18 15:23:20 +03:00
Michele Artini
e56ccec536
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-05-18 14:00:28 +02:00
Michele Artini
c1e20de7cf
fixed the deserialization of a json property
2021-05-18 14:00:14 +02:00
Claudio Atzori
a9f512103b
using constants from ModelConstants
2021-05-18 11:19:07 +02:00
Claudio Atzori
eeb8bcf075
using constants from ModelConstants
2021-05-18 11:10:07 +02:00
Claudio Atzori
2cbf15f4fb
using ModelConstants
2021-05-17 09:54:45 +02:00
Enrico Ottonello
e13926cdd0
merged with master
2021-05-14 18:10:31 +02:00
Claudio Atzori
f19feceaf0
set the old identifier before switching to the new one
2021-05-14 12:53:40 +02:00
Claudio Atzori
1bd70fa2c6
preserving the old identifier among the originalIds in the doiboost construction process
2021-05-14 11:30:41 +02:00
Claudio Atzori
ca3f3a7687
using ModelConstants
2021-05-14 11:29:49 +02:00
Claudio Atzori
23b8883ab1
applied intellij code cleanup
2021-05-14 10:58:12 +02:00
Claudio Atzori
609eb711b3
IndexRecordTransformerTest for producing a record that can be manually submitted to solr
2021-05-13 16:13:28 +02:00
Claudio Atzori
1517bf7c92
IndexRecordTransformerTest for producing a record that can be manually submitted to solr
2021-05-13 16:11:22 +02:00
Sandro La Bruzzo
d9a0bbda7b
implemented new phase in doiboost to make the dataset Distinct by ID
2021-05-13 12:25:14 +02:00
Sandro La Bruzzo
6424cd9062
Added passing of the following parameters:
...
-varDataSourceId
-varOfficialName
in Each transformation Rule
2021-05-11 15:17:38 +02:00