Sandro La Bruzzo
7bd224f051
implement first version of scholexplorer integration for the generation of final graph
2021-11-02 15:58:15 +01:00
Claudio Atzori
1225ba0b92
[resolution] increasing number of partitions to avoid OOM
2021-10-28 16:18:17 +02:00
Sandro La Bruzzo
d9cbca83f7
moved filter on next phase
2021-10-28 16:13:24 +02:00
Sandro La Bruzzo
1be9aa0a5f
Removed filter of datacite items from the raw graph merging phase, Datacite is not an actionset anymore in beta
2021-10-26 17:52:20 +02:00
Sandro La Bruzzo
4acfa8fa2e
Scholexplorer Datasource Aggregation:
...
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Sandro La Bruzzo
034304b33a
conflict resolved on merge
2021-10-26 09:40:47 +02:00
Claudio Atzori
d147295c2f
avoiding java.io.NotSerializableException: java.util.HashMap
2021-10-21 14:15:57 +02:00
Claudio Atzori
3702fe478d
cleanup
2021-10-21 12:05:02 +02:00
Sandro La Bruzzo
ac36aa7d1c
fixed wrong Encoding during a map phase
2021-10-21 11:35:02 +02:00
Sandro La Bruzzo
ae4e99a471
Adapted workflow of resolution of PID to work into OpenAIRE data workflow
...
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Claudio Atzori
00b78b9c58
cleanup: mapping contents in the graph already defined in the OAF graph model doesn't require to be aware of the vocabularies
2021-10-20 14:04:45 +02:00
Claudio Atzori
c01dd0c925
registered oaf model classes for the KryoSerializer
2021-10-20 13:55:07 +02:00
Claudio Atzori
515e068a78
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-19 16:46:06 +02:00
Claudio Atzori
512e7b0170
code formatting
2021-10-19 16:19:29 +02:00
Claudio Atzori
e9157c67aa
Merge branch 'beta' into dump
2021-10-19 16:15:03 +02:00
Claudio Atzori
98f37c8d81
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 16:14:40 +02:00
Claudio Atzori
c8850456e9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-19 16:09:54 +02:00
Claudio Atzori
7a73010acd
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 11:59:16 +02:00
Miriam Baglioni
c7f6cd2591
added again the setting for saXReader
2021-10-19 10:15:26 +02:00
miconis
5f780a6ba1
bug fix in migrate entities: parameter name was wrong
2021-10-18 23:30:40 +02:00
Miriam Baglioni
1315952702
merge with branch beta
2021-10-18 14:17:09 +02:00
Sandro La Bruzzo
7b15b88d4c
renamed wrong package, implemented last aggregation workflow for scholexplorer
2021-10-15 15:00:15 +02:00
Sandro La Bruzzo
51a03c0a50
refactor code for EBI from dhp-graph-mapper into dhp-aggregation
2021-10-14 14:23:13 +02:00
miconis
995c1eddaf
minor change
2021-10-13 17:07:10 +02:00
miconis
326bf63775
integration of parent child orgs relations
2021-10-13 12:24:48 +02:00
Miriam Baglioni
63933808d4
added fix for mixing result types, added configuration default to funder subworkflow
2021-10-13 11:28:28 +02:00
Miriam Baglioni
fec40bdd95
merging with branch beta - resolved conflicts
2021-10-12 09:16:36 +02:00
Miriam Baglioni
83f51f1812
refactoring
2021-10-12 09:14:43 +02:00
Sandro La Bruzzo
5606014b17
code refactor see ticket #7065
2021-10-12 08:11:53 +02:00
Sandro La Bruzzo
2557bb41f5
Implemented new method for update baseline inside scala node
2021-10-06 16:41:08 +02:00
Sandro La Bruzzo
b84e0cabeb
Implemented new method for update baseline
2021-10-05 16:34:47 +02:00
Sandro La Bruzzo
991b06bd0b
removed generation of EBI links from old dump, now EBI link dump is created by another wf
2021-10-05 10:21:33 +02:00
Miriam Baglioni
e653756e3d
applied some suggestiond from Sonar Lint
2021-10-04 18:40:07 +02:00
Miriam Baglioni
9814c3e700
mergin with branch beta
2021-10-01 13:00:03 +02:00
Miriam Baglioni
c4ccd7b32c
-
2021-10-01 12:59:47 +02:00
Miriam Baglioni
c8321ad31a
merge with branch beta
2021-10-01 12:59:08 +02:00
Claudio Atzori
60a6a9a583
[graph2hive] added field 'measures' to the result view
2021-09-30 09:27:26 +02:00
Claudio Atzori
ebf53a1616
added cleaning for relation fields: subRelType & relClass according to dedicated vocabs
2021-09-15 16:10:37 +02:00
Sandro La Bruzzo
e8b3cb9147
Implemented method to download delta updates in EBI Links
2021-08-30 09:32:45 +02:00
Alessia Bardi
ccf4103a25
keep the original url if the decoder fails for any reason
2021-08-25 10:07:58 +02:00
Sandro La Bruzzo
45898c71ac
fixed wrong doi in pubmed
2021-08-24 15:20:04 +02:00
Alessia Bardi
00a28c0080
originalId was renamed to acronym
2021-08-23 15:02:21 +02:00
Alessia Bardi
f19b04d41b
code formatting after mvn compile
2021-08-23 14:33:39 +02:00
Alessia Bardi
931f430129
Merge branch 'beta' into datasource_model_eosc_beta
2021-08-23 11:57:21 +02:00
Alessia Bardi
4c1474e693
Dealing with #6859#note-2: we have to decode URLs to avoid & and other chars encoded becasue of the original XML representation of data
2021-08-20 17:03:30 +02:00
Miriam Baglioni
e5cf11d088
change open access route to result matching hbm to gold
2021-08-19 10:29:04 +02:00
Claudio Atzori
f74adc4752
added DownloadCSV2 as alternative implementation of the same download procedure
2021-08-13 15:52:15 +02:00
Claudio Atzori
5f0903d50d
fixed CSV downloader & tests
2021-08-13 14:17:54 +02:00
Claudio Atzori
17cefe6a97
[HBM] removed stale replace option
2021-08-13 12:43:59 +02:00
Claudio Atzori
7ee2757fcd
fixed DownloadCSV parameters spec; workflow patching the hostedby replaces the graph content (publication, datasource) rather than creating a copy
2021-08-13 12:41:01 +02:00
Claudio Atzori
c3ad4ab701
minor fixes
2021-08-13 12:23:15 +02:00
Claudio Atzori
baed5e3337
test classes moved in specific components
2021-08-13 12:14:47 +02:00
Claudio Atzori
3359f73fcf
cleanup & best practices
2021-08-13 12:00:42 +02:00
Miriam Baglioni
f4ec81c92c
mergin with branch beta
2021-08-13 10:31:35 +02:00
Miriam Baglioni
32fd75691f
refactoring
2021-08-13 10:15:42 +02:00
Miriam Baglioni
01db1f8bc4
GetCSV refactoring - removed not needed import
2021-08-13 10:14:17 +02:00
Miriam Baglioni
964a46ca21
GetCSV refactoring - modified due to movement of classes
2021-08-13 10:11:18 +02:00
Miriam Baglioni
eaf077fc34
GetCSV refactoring - removed not needed dependency
2021-08-13 10:08:58 +02:00
Claudio Atzori
9587d4aee8
Merge branch 'beta' into hostedbymap
2021-08-12 17:04:30 +02:00
Claudio Atzori
86d940044c
added test to verify bad records from FWF-E-Book-Library
2021-08-12 11:32:56 +02:00
Claudio Atzori
8cdce59e0e
[graph raw] let the mapping exceptions propagate
2021-08-12 11:32:26 +02:00
Miriam Baglioni
08dd2b2102
moving the dependency version to the external pom file
2021-08-11 18:09:41 +02:00
Miriam Baglioni
785db1d5b2
refactoring
2021-08-11 17:44:07 +02:00
Miriam Baglioni
b966329833
reverting
2021-08-11 17:37:00 +02:00
Miriam Baglioni
8ad7c71417
reverting
2021-08-11 17:36:12 +02:00
Miriam Baglioni
0e1a6bec20
reverting
2021-08-11 17:32:29 +02:00
Miriam Baglioni
c6a2a780a9
reverting
2021-08-11 17:30:17 +02:00
Miriam Baglioni
8229632839
adding assertions to the mapping of the unibi part of gold list
2021-08-11 16:36:01 +02:00
Miriam Baglioni
b1c6140ebf
removed all comments in Italian
2021-08-11 16:23:33 +02:00
Miriam Baglioni
8da3a25cf6
merging with branch beta
2021-08-11 15:55:34 +02:00
Claudio Atzori
9f4db73f30
updated/fixed unit tests
2021-08-11 15:02:51 +02:00
Claudio Atzori
61d811ba53
suggestions from intellij
2021-08-11 12:18:20 +02:00
Claudio Atzori
2ee21da43b
suggestions from SonarLint
2021-08-11 12:13:22 +02:00
Miriam Baglioni
b954fe9ba8
mergin with branch beta
2021-08-11 10:12:46 +02:00
Miriam Baglioni
b688567db5
hostedbymap - modified part of test to check the bestaccessright changed
2021-08-11 10:12:10 +02:00
Miriam Baglioni
9731a6144a
hostedbymap - in case the journal is open access the access may be changed also for the best access right in the result
2021-08-10 17:49:45 +02:00
Miriam Baglioni
a90bac3bc9
Graph Dump - added method to test class to verify addition of validation date in projects for community result
2021-08-09 16:36:54 +02:00
Miriam Baglioni
bd0d7bfba7
Graph Dump - added resources for testing addition of validation date in project for communityresult
2021-08-09 16:36:17 +02:00
Miriam Baglioni
8daaa32e90
Graph Dump - added resources for testing
2021-08-09 15:46:29 +02:00
Miriam Baglioni
bc9e3a06ba
Graph Dump - extended the test class
2021-08-09 15:46:06 +02:00
Miriam Baglioni
2efa5abda5
refactoring
2021-08-09 12:28:36 +02:00
Miriam Baglioni
eff499af9f
added new tests and changed the test example
2021-08-09 11:12:30 +02:00
Miriam Baglioni
c3931557e3
extended the logic of the dump to consider the validation date in the relation (also in the dumped result for communities and funders at the level of the project), the extention on the instance for the APC, the pid, the alternate identifiers, and the extention of the AccessRight to store the OpenAccessRoute. Added new resourec for testing and extended the old class to verify the new dump. Fixed also issue on relation dump: only relation whose source and target are entities in the graph are dumped. The same hold for references to projects
2021-08-06 18:56:18 +02:00
Miriam Baglioni
6bd1eca7e0
merge branch with beta
2021-08-05 15:23:32 +02:00
Miriam Baglioni
73dc082927
added new dumped field (openaccessroute, pid and alternate identifier at the level of the instance) and the bipFinder measure at the level of the result
2021-08-05 15:20:50 +02:00
Miriam Baglioni
ee13da9258
merge branch with master
2021-08-05 11:34:20 +02:00
Claudio Atzori
83c04e5d28
mapping test for dataset records adapted to reflect the delegated pid authority (zenodo)
2021-08-04 10:37:57 +02:00
Miriam Baglioni
c7b71647c6
Hosted By Map - modification of the resource for testing the presence of only one entry per datasource id
2021-08-04 10:20:02 +02:00
Miriam Baglioni
eb8c3f8594
Hosted By Map - test modified because of the application of the new aggregator on datasources
2021-08-04 10:19:17 +02:00
Miriam Baglioni
e94ae0b1de
Hosted By Map - extention of the workflow to consider also the application of the map to publications and datasources
2021-08-04 10:18:11 +02:00
Miriam Baglioni
67ba4c40e0
Hosted By Map - added parameter resources
2021-08-04 10:17:28 +02:00
Miriam Baglioni
eccf3851b0
Hosted By Map - refactoring
2021-08-04 10:16:30 +02:00
Sandro La Bruzzo
74afe43c3a
fixed wrong test file
2021-08-04 10:16:17 +02:00
Miriam Baglioni
1e952cccf6
Hosted By Map - refactoring and deletion of not needed methods
2021-08-04 10:15:43 +02:00
Miriam Baglioni
8ba8c77f92
Hosted By Map - refactoring
2021-08-04 10:14:57 +02:00
Miriam Baglioni
8f7623e77a
Hosted By Map - refactoring and application of the new aggregator
2021-08-04 10:14:20 +02:00
Sandro La Bruzzo
3fc820203b
fixed wrong test file
2021-08-04 10:13:59 +02:00
Miriam Baglioni
a7bf314fd2
Hosted By Map - added new aggregator to get just one result per datasource id
2021-08-04 10:13:30 +02:00
Miriam Baglioni
100e54e6c8
mergin with branch beta
2021-08-03 10:47:11 +02:00
Miriam Baglioni
461b8a29a0
removed not needed class
2021-08-03 10:46:51 +02:00
Miriam Baglioni
327cddde33
Hosted By Map - refactoring
2021-08-03 10:44:13 +02:00
Miriam Baglioni
17292c6641
Hosted By Map - resources for testing purposes
2021-08-02 19:37:08 +02:00
Miriam Baglioni
ee7ccb98dc
Hosted By Map - test class to verify the application of the hbm to results and datasource
2021-08-02 19:36:18 +02:00
Miriam Baglioni
90e91486e2
Hosted By Map - test class to verify each step in the preparation process
2021-08-02 19:35:52 +02:00
Miriam Baglioni
1e859706a3
Hosted By Map - Classes to apply the HBM to results and datasources
2021-08-02 19:35:23 +02:00
Miriam Baglioni
72df8f9232
Hosted By Map - removed the aggregator for the datasource (it is no more needed) and added a new aggregator for the results. Changed also the hostedBYMap aggregator
2021-08-02 19:34:44 +02:00
Miriam Baglioni
ff1ce75e33
Hosted By Map - modification in the code to prepare the info needed to apply the HostedByMap. There is no need to join datasources with the hbm: all the information needed is in the hosted by map already
2021-08-02 19:32:59 +02:00
Claudio Atzori
e826aae848
using constants from ModelConstants
2021-08-02 14:28:59 +02:00
Miriam Baglioni
1695d45bd4
Hosted By Map - Test class to verify the preparation of the intermediate information
2021-07-30 17:57:01 +02:00
Miriam Baglioni
7c6ea2f4c7
Hosted By Map - first attempt for the creation of intermedia information to be used to applu the hosted by map on the graph entities
2021-07-30 17:56:27 +02:00
Miriam Baglioni
d8b9b0553b
Hosted By Map - model classes to store the intermediate information to be used to apply the hosted by map
2021-07-30 17:55:39 +02:00
Miriam Baglioni
613bd3bde0
Hosted By Map - refactor of the first attemp to prepare a new hosted by map dependent on the datasource in the graph and on two external sources: the gold list from unibi ad the doaj list of open access journal. Both the lists are downloaded from provided url parameter
2021-07-30 17:54:45 +02:00
Miriam Baglioni
d1807781c0
mergin with branch beta
2021-07-30 14:34:07 +02:00
Miriam Baglioni
1d6ac3715b
merge branch with beta
2021-07-30 11:58:29 +02:00
Claudio Atzori
19620eed46
applying PR#131, Patch the identifiers (source/target) in the relations, refinements
2021-07-30 11:09:32 +02:00
Claudio Atzori
4f78565c04
fixed implementation of PatchRelationsApplication, refined the relative unit test
2021-07-30 11:07:09 +02:00
Claudio Atzori
a6a38cca9e
fixed implementation of PatchRelationsApplication, refined the relative unit test
2021-07-30 11:06:11 +02:00
Miriam Baglioni
9bc4fd3b69
Patch FCT relations - fixed issue with join
2021-07-30 10:34:05 +02:00
Miriam Baglioni
2fc89fc9b5
Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement
2021-07-30 10:20:43 +02:00
Claudio Atzori
081fe92a21
Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement
2021-07-30 10:13:56 +02:00
Claudio Atzori
576693d782
added unit test for PatchRelationsApplication
2021-07-30 10:13:33 +02:00
Miriam Baglioni
baad01cadc
hostedbymap
2021-07-29 13:04:39 +02:00
Claudio Atzori
e725c88ebb
[raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations
2021-07-29 13:03:43 +02:00
Claudio Atzori
5d08ad86ae
[raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations
2021-07-29 13:03:16 +02:00
Claudio Atzori
e87e1805c4
[raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset
2021-07-29 12:13:06 +02:00
Claudio Atzori
5f7330d407
Merge branch 'master' into fct_project_id_replacement
2021-07-29 11:38:22 +02:00
Claudio Atzori
1923c1ce21
replaced full join + filtering with a left join
2021-07-29 11:36:20 +02:00
Claudio Atzori
a9961a1835
[cleaning] title cleaning based on the me.xuender:unidecode library
2021-07-28 16:36:33 +02:00
Claudio Atzori
e1797c0a42
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-07-28 16:21:36 +02:00
Claudio Atzori
6dddad86ee
[cleaning] title cleaning based on the me.xuender:unidecode library
2021-07-28 16:21:29 +02:00
Alessia Bardi
c806387d4b
tests for enermaps
2021-07-28 11:54:36 +02:00
Claudio Atzori
2fff24df55
code formatting
2021-07-28 11:34:19 +02:00
Michele Artini
9f1c7b8e17
tests
2021-07-28 11:32:34 +02:00
Michele Artini
e6f1773d63
mapping of new eosc fields
2021-07-28 11:17:11 +02:00
Michele Artini
c72c960ffb
added eosc fields
2021-07-28 11:03:15 +02:00
Michele Artini
1fb572a33a
added eosc fields
2021-07-28 10:52:24 +02:00
Miriam Baglioni
708d0ade34
Merge branch 'beta' into hostedbymap
2021-07-28 10:37:22 +02:00
Miriam Baglioni
0424f47494
HostedByMap fixing issues
2021-07-28 10:24:13 +02:00
Claudio Atzori
d267dce520
[raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset
2021-07-27 17:18:29 +02:00
Claudio Atzori
5aa7d16d1b
updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest
2021-07-27 15:11:58 +02:00
Claudio Atzori
998b66855a
updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest
2021-07-27 15:11:37 +02:00
Miriam Baglioni
74f801b689
mergin with branch beta
2021-07-27 13:18:31 +02:00
Miriam Baglioni
35e395eae8
merge with master
2021-07-27 12:34:59 +02:00
Miriam Baglioni
eb07f7f40f
Hosted By Map
2021-07-27 12:27:26 +02:00
Sandro La Bruzzo
848aabbb6c
minor fix
2021-07-25 12:06:41 +02:00
Sandro La Bruzzo
8fac10c91e
fixed defintion wf of creation final infospace of scholexplorer
2021-07-25 11:15:37 +02:00
Sandro La Bruzzo
3920c69bc8
change implementation of resolve Relation to generate jsonRdd in output
2021-07-25 09:51:36 +02:00
Sandro La Bruzzo
d9e3b89937
implemented last part of workflows to generate scholixGraph
2021-07-23 16:38:32 +02:00
Sandro La Bruzzo
cfde63a7c3
fixed resolve relation join
2021-07-23 14:17:29 +02:00
Sandro La Bruzzo
4a439c3863
NPE fixed
2021-07-23 14:17:29 +02:00
Sandro La Bruzzo
ca74e8dd02
create a separate wf for resolving relation
2021-07-23 11:40:06 +02:00
Sandro La Bruzzo
43e9380cd3
update resolve relation to use the same format of openaire graph
2021-07-23 11:25:18 +02:00
Sandro La Bruzzo
62ae36a3d2
fixed NPE
2021-07-22 15:41:38 +02:00
Sandro La Bruzzo
31d2d6d41e
Scholexplorer: introduction of dedup openaire
2021-07-21 18:09:32 +02:00
Alessia Bardi
9069958479
tests for enermaps
2021-07-20 19:31:43 +02:00
Claudio Atzori
65934888a1
adding record identifier among the originalIds regardless of what IdentifierFactory produces
2021-07-19 17:52:52 +02:00
Claudio Atzori
5947cddafc
adding record identifier among the originalIds regardless of what IdentifierFactory produces
2021-07-19 17:52:24 +02:00
Claudio Atzori
0977baf41d
contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph
2021-07-19 17:43:52 +02:00
Claudio Atzori
5e5f65a3c3
contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph
2021-07-19 15:56:55 +02:00
Sandro La Bruzzo
7e2caafe84
Scholexplorer: fixed mapping typologies
2021-07-15 09:53:12 +02:00
Miriam Baglioni
774cdb190e
changes to mirror the last dump of the graph with the ols data model.
2021-07-13 18:57:24 +02:00
Miriam Baglioni
886617afd0
One result linked to more than on project is saved just once
2021-07-13 18:15:35 +02:00
Miriam Baglioni
320cf02d96
Changed the way to find results linked to projects. We verify to actually have the project on the graph before selecting the result
2021-07-13 18:13:32 +02:00
Miriam Baglioni
52ce35d57b
-
2021-07-13 18:08:46 +02:00
Miriam Baglioni
970b387b8d
modification to allow dump of a single community
2021-07-13 18:08:10 +02:00
Miriam Baglioni
eae10c5894
modification to allow the dump for a single community
2021-07-13 18:07:25 +02:00
Miriam Baglioni
c028feef4f
workflow for the dump as sub workflows
2021-07-13 18:06:44 +02:00
Miriam Baglioni
d70f8c96fd
funding contains and not starts with h2020
2021-07-13 17:34:53 +02:00
Miriam Baglioni
5e38c7f42d
dumping only communities with status all
2021-07-13 17:32:38 +02:00
Miriam Baglioni
618d2de2da
minor changes and refactoring
2021-07-13 17:10:02 +02:00
Miriam Baglioni
59615da65e
Add test to verify the creation of relation between context and projects
2021-07-13 17:09:15 +02:00
Miriam Baglioni
084b4ef999
added the creation of the openaireId from funder and grant number if the element is not present in the context profile
2021-07-13 17:07:46 +02:00
Miriam Baglioni
8f322a73cb
change because of the renaming of originalId in acronym
2021-07-13 16:22:58 +02:00
Miriam Baglioni
72397ea1ba
Added fix for community of arbitrary name length
2021-07-13 16:18:35 +02:00
Miriam Baglioni
5295d10691
added check not to dump deletedByInference entities
2021-07-13 16:11:46 +02:00
Miriam Baglioni
e9a17ec899
added check to verify not to add void APC
2021-07-13 15:53:35 +02:00
Miriam Baglioni
8429aed6c6
Added resource for testing selection of valid relations
2021-07-13 15:49:38 +02:00
Miriam Baglioni
39b1a6edf6
added test class for the selection of valid relations and description
2021-07-13 15:23:09 +02:00
Miriam Baglioni
9a58f1b93d
added logic to select only the valid relations: those not deletedbyinference and having both part of the relation as entities in the graph
2021-07-13 15:20:39 +02:00
Miriam Baglioni
13c66e16be
changed logic to split for communities
2021-07-13 15:15:27 +02:00
Miriam Baglioni
6410ab71d8
added APC in the dump and test method
2021-07-13 15:13:58 +02:00
Miriam Baglioni
65a242646d
added resource for APC dump
2021-07-13 14:45:25 +02:00
Miriam Baglioni
4b432fbee8
extended test class
2021-07-13 14:40:39 +02:00
Miriam Baglioni
87a6e2b967
extended test class
2021-07-13 14:38:28 +02:00
Miriam Baglioni
69fd40fd30
modified code to split the Croatian funder
2021-07-13 14:35:26 +02:00
Miriam Baglioni
86e50f7311
modified code to split the Croatian funder
2021-07-13 14:31:45 +02:00
Miriam Baglioni
da88c850c6
changed the logic to verify if a community is contained in the list of context of a result
2021-07-13 14:22:44 +02:00
Miriam Baglioni
2f66fedfec
changed the logic to verify if a community is contained in the list of context of a result
2021-07-13 14:22:23 +02:00
Sandro La Bruzzo
bbe8193930
merged stable ids
2021-07-12 17:00:43 +02:00
Sandro La Bruzzo
09fccf8000
added workflow to serialize scholix and summary in json
2021-07-09 11:01:42 +02:00
Sandro La Bruzzo
0ea576745f
updated CreateInputGraph because ggenerics don't work on Spark Dataset
2021-07-09 10:29:24 +02:00
Sandro La Bruzzo
cd17e19044
implemented branch workflow to import datacite and crossref in scholexplorer
2021-07-08 21:20:19 +02:00
Sandro La Bruzzo
8a034e46e1
updated baseline workflow
2021-07-08 11:11:41 +02:00
Claudio Atzori
b7b8e0986e
[raw_all] The claim merge procedure includes the claimed contexts in the merged result
2021-07-08 10:42:31 +02:00
Sandro La Bruzzo
0799ac9fb6
fixed wrong path
2021-07-08 10:36:37 +02:00
Sandro La Bruzzo
4d53402712
extended ebiLinks to create a dataset before generation of OAF
2021-07-08 10:26:21 +02:00
Sandro La Bruzzo
a4a54a3786
code refactor
2021-07-08 09:08:25 +02:00
Sandro La Bruzzo
a01dbe0ab0
completed workflow of generation of scholix and summaries
2021-07-07 23:10:34 +02:00
Claudio Atzori
fdcff42e46
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-07-07 19:01:59 +02:00
Claudio Atzori
bc014023c8
Merge pull request 'to solve the scala SI-3623' ( #122 ) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: D-Net/dnet-hadoop#122
2021-07-07 11:13:51 +02:00
Claudio Atzori
32bdfdccbc
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-07-07 11:08:27 +02:00
Claudio Atzori
f580cb77e1
added mapping for claim relation 'resultResult_publicationDataset_isRelatedTo' (present on BETA)
2021-07-06 21:11:11 +02:00
Sandro La Bruzzo
8535506c22
added scholix generation
2021-07-06 17:18:06 +02:00
Sandro La Bruzzo
4c54bd8742
add test to verify merge scholix on source
2021-07-06 11:32:14 +02:00
Andreas Czerniak
3531802710
to solve the scala SI-3623
2021-07-06 11:30:56 +02:00
Sandro La Bruzzo
7d8db2eb8a
betterRenamingMethod
2021-07-06 09:56:32 +02:00
Sandro La Bruzzo
c952c8d236
generate first side of scholix mapping
2021-07-06 09:53:14 +02:00
Sandro La Bruzzo
e4b84ef5d6
fixed mapping OAF to Scholix summary
2021-07-02 16:48:48 +02:00
Sandro La Bruzzo
c6fa8598e1
massive code refactor:
...
removed modules dhp-*-scholexplorer
2021-07-01 22:13:45 +02:00
Sandro La Bruzzo
84b834c893
added test dataset test for pangaea
2021-06-30 17:31:09 +02:00
Sandro La Bruzzo
1a6b398968
implemented Creation of Raw Graph and Resolution
2021-06-30 17:27:55 +02:00
Sandro La Bruzzo
623a0c4edb
code Refactor, renaming packages
2021-06-30 11:09:30 +02:00
Sandro La Bruzzo
7e08655e5f
added relation dates in all scholexplorer Datasources
2021-06-29 12:02:03 +02:00
Sandro La Bruzzo
075055eaca
added relation dates in bio mapping
2021-06-29 10:33:09 +02:00
Sandro La Bruzzo
f36f92287d
implemented mapping from Crossref Event Data to Oaf
2021-06-29 10:21:23 +02:00
Sandro La Bruzzo
511ec14c63
implemented mapping from EBI and Scholix Resolved to OAF
2021-06-28 22:04:22 +02:00
Sandro La Bruzzo
ad50415167
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-24 17:20:50 +02:00
Sandro La Bruzzo
80e15cc455
implemented mapping from uniprot, pdb and ebi links
2021-06-24 17:20:00 +02:00
Claudio Atzori
2e8fd2c531
cleanup
2021-06-23 14:38:24 +02:00
Claudio Atzori
4dc9ebf217
[raw_all] fixed unit test
2021-06-23 14:38:07 +02:00
Claudio Atzori
50fc5a64a0
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-06-23 11:49:42 +02:00
Sandro La Bruzzo
080a280bea
added pdb to Oaf Transformation
2021-06-21 16:23:59 +02:00
Sandro La Bruzzo
507e42102a
added pdb to oaf class
2021-06-21 09:36:40 +02:00
Sandro La Bruzzo
4fe7b75644
renamed packages
2021-06-18 16:41:24 +02:00
Sandro La Bruzzo
3100166d29
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-16 16:22:16 +02:00
Claudio Atzori
7243a40c88
code formatting
2021-06-16 15:03:03 +02:00
Sandro La Bruzzo
dfcf78cf24
removed wrong code
2021-06-16 14:57:42 +02:00
Sandro La Bruzzo
cc0f2b11fb
Implemented mapping from pubmed baseline to OAF
2021-06-16 14:56:24 +02:00
Michele Artini
ada063ce70
fixed a problem with empty mdstore list (2)
2021-06-14 12:04:47 +02:00
Michele Artini
83132ee99a
fixed a problem with empty mdstore list
2021-06-14 11:57:00 +02:00
Sandro La Bruzzo
aeb8132627
Merged branch stable_ids
2021-06-14 10:07:29 +02:00
Claudio Atzori
2039bb9f5f
orcid / orcid_pending cleaning backported from master branch
2021-06-14 09:40:50 +02:00
Claudio Atzori
dd19c4ac5a
Merge pull request 'import_new_mdstores' ( #112 ) from import_new_mdstores into stable_ids
...
Reviewed-on: D-Net/dnet-hadoop#112
2021-06-14 09:23:55 +02:00
Claudio Atzori
a900bfb874
delegating the date parsing to https://github.com/sisyphsu/dateparser
2021-06-11 16:53:01 +02:00
Sandro La Bruzzo
5b724d9972
added relations to datacite mapping
2021-06-04 10:14:22 +02:00
Sandro La Bruzzo
e57294ac99
implemented changes on PUBMed dataflow
2021-06-03 10:52:09 +02:00
Michele Artini
ede2749822
orcid pid type
2021-06-01 12:42:43 +02:00
Michele Artini
f0fbfdcfae
Merge branch 'stable_ids' into import_new_mdstores
2021-06-01 12:03:00 +02:00
Michele Artini
e950750262
add nodes to import hdfs mdstores
2021-06-01 10:48:50 +02:00
Michele Artini
03a510859a
removed coalesce(1)
2021-05-31 14:10:51 +02:00
Michele Artini
e9f2b6037c
patch of mdstore records
2021-05-31 11:36:26 +02:00
Michele Artini
ad56a44fda
save as gzipped sequence file
2021-05-28 14:45:39 +02:00
Claudio Atzori
6e3a4e9237
updated test expectations
2021-05-28 09:37:50 +02:00
Michele Artini
4fa5671d16
first implementation of Hdfs Mdstores Importer
2021-05-27 16:22:07 +02:00
Claudio Atzori
5e4b91d9ef
more pervasive use of constants from ModelConstants, especially for ORCID
2021-05-26 18:20:23 +02:00
Claudio Atzori
9d725efdc1
reverted implementation of the mdstore client
2021-05-20 18:26:09 +02:00
Claudio Atzori
ae5c28e54f
code formatting
2021-05-20 16:13:06 +02:00
Claudio Atzori
232dce83db
fixes #6701 : xpath for titles to support both datacite and Guidelines v4 mapping
2021-05-20 14:41:15 +02:00
Claudio Atzori
23b8883ab1
applied intellij code cleanup
2021-05-14 10:58:12 +02:00
Claudio Atzori
d4c3476152
mapping datasource.journal only when an issn is available, null otherwhise
2021-05-11 11:08:54 +02:00
Claudio Atzori
d1cbee8413
imported methods from CleaningFunctions, defined in GraphCleaningFunctions
2021-05-10 16:43:39 +02:00
Claudio Atzori
d4a30fabe3
clean up tests
2021-05-05 17:28:15 +02:00
Claudio Atzori
dccaf173cf
fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials
2021-05-05 16:36:15 +02:00
Claudio Atzori
2e1eb96f9a
code formatting
2021-05-05 11:23:57 +02:00
Claudio Atzori
fb930b84d3
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-05-04 18:06:30 +02:00
Claudio Atzori
923d19ea8e
mdstore read lock/unlock when bulk copying records from mongodb to hdfs
2021-05-04 18:06:21 +02:00
Sandro La Bruzzo
714b71bd21
updated pubmed
2021-05-04 14:54:12 +02:00
Alessia Bardi
9a20057615
fixed query for organisations' pids
2021-04-29 15:23:39 +02:00
Sandro La Bruzzo
2129e9caa7
updated pangaea transformation to parse directly the xml
2021-04-28 10:21:03 +02:00
Claudio Atzori
5afa7d3e0c
core utilities in dhp-common moved in external module dhp-schemas
2021-04-27 15:44:01 +02:00
Sandro La Bruzzo
74484d2823
bug fixing
2021-04-27 12:13:44 +02:00
Sandro La Bruzzo
c74b03d59c
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-27 11:31:07 +02:00
Sandro La Bruzzo
7f8848ecdd
added first implementation of Pangaea Mapping
2021-04-27 11:30:37 +02:00
Claudio Atzori
27ab8a704d
adjusted poms to align with the external dhp-schema module
2021-04-27 10:12:27 +02:00
Claudio Atzori
c2bb03c8b5
depending on external dhp-schemas module
2021-04-23 17:57:35 +02:00
Claudio Atzori
c25238480c
making ODF record parsing namespace unaware ( #6629 )
2021-04-23 17:34:57 +02:00
Claudio Atzori
d0d477cca3
code formatting
2021-04-20 12:50:34 +02:00
miconis
0393cdce42
addition of alternative names in export queries
2021-04-20 12:45:21 +02:00
miconis
cadd0a5de8
modification of the queries for openorgs: they now consider also pending orgs
2021-04-20 12:06:56 +02:00
Claudio Atzori
d1ca025b0b
[cleaning] remiving authors without fullname or providing 'deactivated' keyword. Removing test test titles
2021-04-13 14:32:41 +02:00
miconis
11b22b2d23
bug fix in the query, it now exports only relations with non-hidden organizations
2021-04-08 11:51:47 +02:00
miconis
0857100fb8
implementation of the tests for the openorgs integration in the openaire provision
2021-04-07 18:42:16 +02:00
miconis
bf685d849f
addition of pids in the query for the export of openorgs for the provision, addition of ec_fields in the openorgs model
2021-04-07 14:27:43 +02:00
miconis
eaaefb8b4c
implementation of the procedure to reuse content of different dbs when creating the raw graph
2021-04-06 14:35:51 +02:00
miconis
c39c82dfe9
modification of the jobs for the integration of openorgs in the provision, dedup records are no more created by merging but simply taking results of openorgs portal
2021-04-06 14:31:00 +02:00
Claudio Atzori
7941d7be29
WIP: using common definitions from ModelConstants
2021-03-31 18:33:57 +02:00
Claudio Atzori
72ce741ea6
WIP: using common definitions from ModelConstants
2021-03-31 17:07:13 +02:00
Claudio Atzori
9237d55d7f
[OpenOrgsWf] cleanup
2021-03-29 17:40:34 +02:00
Claudio Atzori
7f4e9479ec
[OpenOrgsWf] graph construction wf: allow to skip the import openorgs node (importOpenorgs true|false)
2021-03-29 16:59:16 +02:00
miconis
2709d08fc2
Merge branch 'stable_ids' into openorgswf
2021-03-29 16:39:07 +02:00
miconis
f446580e9f
code refactoring (useless classes and wf removed), implementation of the test for the openorgs dedup
2021-03-29 16:10:46 +02:00
miconis
2355cc4e9b
minor changes and bug fix
2021-03-29 10:07:12 +02:00
Claudio Atzori
827e7e37db
[Cleaning] drop instance.alternateIdentifier elements when they are available among instance.pid
2021-03-25 11:07:59 +01:00
miconis
28c1cdd132
merged stable_ids into openorgswf
2021-03-25 10:44:49 +01:00
miconis
348b0ef921
bug fix, implementation of the workflow for the creation of raw_organizations (openorgs dedup), addition of the pid lists to the openorgs postgres db
2021-03-24 15:51:27 +01:00
Claudio Atzori
751125fdf9
[Actionmanager] zero function considers empty entity.id as well as rel.source/rel.target
2021-03-23 17:34:32 +01:00
Claudio Atzori
b4febed138
updated mapping tests as consequence of the special treatment reserved to Handle PIDs
2021-03-23 09:37:48 +01:00
Claudio Atzori
431cbe9955
handle missing instance.pid during bulk cleaning
2021-03-23 09:28:58 +01:00
Sandro La Bruzzo
c73072079d
fix conflicts
2021-03-22 16:36:31 +01:00
Claudio Atzori
5a043e95ea
code formatting
2021-03-19 11:37:27 +01:00
Claudio Atzori
a4e82a65aa
integrated filter applied when merging BETA & PROD graphs to rule our records from Datacite
2021-03-19 11:34:44 +01:00
Claudio Atzori
8257f9a2bc
result.pid: adjusted the mapping applied to the contents from the aggregator
2021-03-17 12:45:38 +01:00
Claudio Atzori
640b885706
added instance.alternativeIdentifiers to the graph model, adjusted the mapping applied to the contents from the aggregator
2021-03-16 14:19:32 +01:00
Claudio Atzori
01630f638d
IdentifierFactory implementation based on the list of datasources authoritative for a given pid type
2021-03-09 17:11:50 +01:00
Claudio Atzori
59532b0919
[ #6281 Provenance of product PIDs] Added PIDs to the Instance type; extended mapping for OAF/ODF records
2021-03-09 11:14:45 +01:00
Claudio Atzori
d525785497
[ #6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color.
2021-03-09 11:12:55 +01:00
Claudio Atzori
f468c7f0d7
merged from master
2021-03-09 09:12:41 +01:00
Claudio Atzori
8d2bb24512
merged from master
2021-03-08 15:44:34 +01:00
Claudio Atzori
fa7930d2e2
merging contributions from PR#97
2021-03-05 15:45:28 +01:00
miconis
1a85020572
bug fix in graph-mapper, changes in the implementation of the openorgs wf to create relations and populate openorgs db
2021-02-26 10:19:28 +01:00
Claudio Atzori
b830e33392
mdstore collector plugin
2021-02-25 12:30:30 +01:00
Claudio Atzori
fc3fa5e343
implemented mdstore collector plugin
2021-02-24 15:07:24 +01:00
miconis
4b2124a18e
implementation of the openorgs wfs, implementation of the raw_all wf to migrate openorgs db entities
2021-02-10 11:51:50 +01:00
Alessia Bardi
c4d1feca74
mapper test with validated link to project
2021-02-10 11:22:54 +01:00
Claudio Atzori
72c57b28fa
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
2021-02-04 14:08:18 +01:00
Alessia Bardi
c67329d3ad
updated test for EU Open Data portal datasets
2021-02-03 17:06:48 +01:00
Alessia Bardi
fd705404a1
tests for EU Open Data portal dataset mapping
2021-02-03 10:28:17 +01:00
Sandro La Bruzzo
686e7b507c
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into aggregation_on_hadoop
2021-01-28 10:02:13 +01:00
Sandro La Bruzzo
98b9498b57
Removed old messaging system not quite used from collection and Transformation workflow
...
code refactor
2021-01-28 09:51:17 +01:00
Sandro La Bruzzo
150a617bd1
Merge pull request 'aggregation_on_hadoop' ( #90 ) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
...
Wonderfull code... You're the Best Sandro
2021-01-26 16:00:47 +01:00
Claudio Atzori
885e0dd926
[Cleaning] filter authors not providing word characters in the fullname
2021-01-26 09:48:53 +01:00
Claudio Atzori
2890511613
[Cleaning] normalise missing Result.country
2021-01-26 09:41:44 +01:00
Claudio Atzori
4eb9ed35b1
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-01-25 18:12:24 +01:00
Claudio Atzori
cd379eb5e3
[Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname
2021-01-25 18:11:49 +01:00
Alessia Bardi
505477f36f
format code
2021-01-25 18:02:49 +01:00
Alessia Bardi
ded6ed8d7d
no ',' author, if there are no author in ODF records
2021-01-25 17:57:51 +01:00
Claudio Atzori
3465c8ccee
[Cleaning] trying to avoid NPEs
2021-01-25 16:54:53 +01:00
Sandro La Bruzzo
a54848a59c
Moved Vocabulary stuff to common module
2021-01-25 15:43:04 +01:00
Claudio Atzori
07a0ccfc96
[Cleaning] trying to avoid NPEs
2021-01-25 13:36:01 +01:00
Claudio Atzori
34d653de41
[Cleaning] updated cleaning rule for DOIs
2021-01-22 14:16:33 +01:00
Claudio Atzori
26e9d55c13
code formatting
2021-01-05 09:59:26 +01:00
Claudio Atzori
7185158942
ignore missing properties
2020-12-29 11:06:28 +01:00
Claudio Atzori
28460c2cd1
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
2020-12-23 16:59:52 +01:00
Claudio Atzori
723b01f9e9
trivial: the less magic numbers and values around, the better
2020-12-23 12:22:48 +01:00
Claudio Atzori
6cb0dc3f43
extended OCRID cleaning procedure
2020-12-21 11:40:17 +01:00
Claudio Atzori
47270d9af5
lenient mock can be lenient
2020-12-18 15:38:59 +01:00
Alessia Bardi
f9a8fd8bbd
updated test record for textgrid
2020-12-17 11:59:45 +01:00
Michele Artini
991e675dc6
validation in claim rels
2020-12-14 15:41:25 +01:00
Claudio Atzori
12e2f930c8
resolved conflicts
2020-12-10 10:57:39 +01:00
Alessia Bardi
112da6d76a
in theory, just auto-formatting after mvn compile
2020-12-09 20:00:27 +01:00
Alessia Bardi
bece04b330
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-12-09 19:54:43 +01:00
Alessia Bardi
426b76ee8e
more asserts for TextGrid record
2020-12-09 19:46:11 +01:00
Claudio Atzori
4705144918
Merge pull request 'rel_project_validation' ( #69 ) from rel_project_validation into master
...
LGTM
2020-12-09 19:01:20 +01:00
Claudio Atzori
ada21ad920
Merge pull request 'dump of the results related to at least one project' ( #61 ) from miriam.baglioni/dnet-hadoop:dump into master
...
LGTM
2020-12-09 17:22:56 +01:00
Michele Artini
1bc9adc10d
default trust for validated rels
2020-12-09 16:18:37 +01:00
Michele Artini
5f21a356fd
reindent
2020-12-09 11:24:30 +01:00
Michele Artini
370a5e650b
validation attributes in resultProject relations
2020-12-09 11:18:26 +01:00
Claudio Atzori
a104a632df
cleanup
2020-12-04 16:32:47 +01:00
Miriam Baglioni
5fb65ffc4a
merge branch with master
2020-12-03 11:24:35 +01:00
Miriam Baglioni
ea88dc3401
fixed issue in property name
2020-12-03 11:24:23 +01:00
Claudio Atzori
cfb55effd9
code formatting
2020-12-02 11:23:49 +01:00
Claudio Atzori
57f448b7a4
graph cleaning workflow separate orcid_pending from orcid, depending on the author pid provenance
2020-12-02 10:44:05 +01:00
Alessia Bardi
a417624670
tests for raw graph mapping
2020-12-02 10:15:26 +01:00
Claudio Atzori
893ac4a77b
GenerateEntitiesApplication can be configured to hash the id value or not
2020-12-02 09:30:06 +01:00
Claudio Atzori
2c407e775e
GenerateEntitiesApplication can be configured to hash the id value or not
2020-11-30 12:00:38 +01:00
Claudio Atzori
e731a7658d
cleaning texts to remove tab characters too
2020-11-27 09:00:04 +01:00
Claudio Atzori
c1b9a4045a
grouping of records will be performed by the dedup workflow
2020-11-26 10:59:10 +01:00
Miriam Baglioni
124591a7f3
refactoring
2020-11-25 18:23:28 +01:00
Miriam Baglioni
1a89f8211c
D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:12:40 +01:00
Miriam Baglioni
5fbe54ef54
D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:10:28 +01:00