Miriam Baglioni
|
461b8a29a0
|
removed not needed class
|
2021-08-03 10:46:51 +02:00 |
Miriam Baglioni
|
327cddde33
|
Hosted By Map - refactoring
|
2021-08-03 10:44:13 +02:00 |
Miriam Baglioni
|
17292c6641
|
Hosted By Map - resources for testing purposes
|
2021-08-02 19:37:08 +02:00 |
Miriam Baglioni
|
ee7ccb98dc
|
Hosted By Map - test class to verify the application of the hbm to results and datasource
|
2021-08-02 19:36:18 +02:00 |
Miriam Baglioni
|
90e91486e2
|
Hosted By Map - test class to verify each step in the preparation process
|
2021-08-02 19:35:52 +02:00 |
Miriam Baglioni
|
1e859706a3
|
Hosted By Map - Classes to apply the HBM to results and datasources
|
2021-08-02 19:35:23 +02:00 |
Miriam Baglioni
|
72df8f9232
|
Hosted By Map - removed the aggregator for the datasource (it is no more needed) and added a new aggregator for the results. Changed also the hostedBYMap aggregator
|
2021-08-02 19:34:44 +02:00 |
Miriam Baglioni
|
ff1ce75e33
|
Hosted By Map - modification in the code to prepare the info needed to apply the HostedByMap. There is no need to join datasources with the hbm: all the information needed is in the hosted by map already
|
2021-08-02 19:32:59 +02:00 |
Claudio Atzori
|
e826aae848
|
using constants from ModelConstants
|
2021-08-02 14:28:59 +02:00 |
Miriam Baglioni
|
1695d45bd4
|
Hosted By Map - Test class to verify the preparation of the intermediate information
|
2021-07-30 17:57:01 +02:00 |
Miriam Baglioni
|
7c6ea2f4c7
|
Hosted By Map - first attempt for the creation of intermedia information to be used to applu the hosted by map on the graph entities
|
2021-07-30 17:56:27 +02:00 |
Miriam Baglioni
|
d8b9b0553b
|
Hosted By Map - model classes to store the intermediate information to be used to apply the hosted by map
|
2021-07-30 17:55:39 +02:00 |
Miriam Baglioni
|
613bd3bde0
|
Hosted By Map - refactor of the first attemp to prepare a new hosted by map dependent on the datasource in the graph and on two external sources: the gold list from unibi ad the doaj list of open access journal. Both the lists are downloaded from provided url parameter
|
2021-07-30 17:54:45 +02:00 |
Miriam Baglioni
|
d1807781c0
|
mergin with branch beta
|
2021-07-30 14:34:07 +02:00 |
Miriam Baglioni
|
1d6ac3715b
|
merge branch with beta
|
2021-07-30 11:58:29 +02:00 |
Claudio Atzori
|
19620eed46
|
applying PR#131, Patch the identifiers (source/target) in the relations, refinements
|
2021-07-30 11:09:32 +02:00 |
Claudio Atzori
|
4f78565c04
|
fixed implementation of PatchRelationsApplication, refined the relative unit test
|
2021-07-30 11:07:09 +02:00 |
Claudio Atzori
|
a6a38cca9e
|
fixed implementation of PatchRelationsApplication, refined the relative unit test
|
2021-07-30 11:06:11 +02:00 |
Miriam Baglioni
|
9bc4fd3b69
|
Patch FCT relations - fixed issue with join
|
2021-07-30 10:34:05 +02:00 |
Miriam Baglioni
|
2fc89fc9b5
|
Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement
|
2021-07-30 10:20:43 +02:00 |
Claudio Atzori
|
081fe92a21
|
Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement
|
2021-07-30 10:13:56 +02:00 |
Claudio Atzori
|
576693d782
|
added unit test for PatchRelationsApplication
|
2021-07-30 10:13:33 +02:00 |
Miriam Baglioni
|
baad01cadc
|
hostedbymap
|
2021-07-29 13:04:39 +02:00 |
Claudio Atzori
|
e725c88ebb
|
[raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations
|
2021-07-29 13:03:43 +02:00 |
Claudio Atzori
|
5d08ad86ae
|
[raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations
|
2021-07-29 13:03:16 +02:00 |
Claudio Atzori
|
e87e1805c4
|
[raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset
|
2021-07-29 12:13:06 +02:00 |
Claudio Atzori
|
5f7330d407
|
Merge branch 'master' into fct_project_id_replacement
|
2021-07-29 11:38:22 +02:00 |
Claudio Atzori
|
1923c1ce21
|
replaced full join + filtering with a left join
|
2021-07-29 11:36:20 +02:00 |
Claudio Atzori
|
a9961a1835
|
[cleaning] title cleaning based on the me.xuender:unidecode library
|
2021-07-28 16:36:33 +02:00 |
Claudio Atzori
|
e1797c0a42
|
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
|
2021-07-28 16:21:36 +02:00 |
Claudio Atzori
|
6dddad86ee
|
[cleaning] title cleaning based on the me.xuender:unidecode library
|
2021-07-28 16:21:29 +02:00 |
Alessia Bardi
|
c806387d4b
|
tests for enermaps
|
2021-07-28 11:54:36 +02:00 |
Claudio Atzori
|
2fff24df55
|
code formatting
|
2021-07-28 11:34:19 +02:00 |
Michele Artini
|
9f1c7b8e17
|
tests
|
2021-07-28 11:32:34 +02:00 |
Michele Artini
|
e6f1773d63
|
mapping of new eosc fields
|
2021-07-28 11:17:11 +02:00 |
Michele Artini
|
c72c960ffb
|
added eosc fields
|
2021-07-28 11:03:15 +02:00 |
Michele Artini
|
1fb572a33a
|
added eosc fields
|
2021-07-28 10:52:24 +02:00 |
Miriam Baglioni
|
708d0ade34
|
Merge branch 'beta' into hostedbymap
|
2021-07-28 10:37:22 +02:00 |
Miriam Baglioni
|
0424f47494
|
HostedByMap fixing issues
|
2021-07-28 10:24:13 +02:00 |
Claudio Atzori
|
d267dce520
|
[raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset
|
2021-07-27 17:18:29 +02:00 |
Claudio Atzori
|
5aa7d16d1b
|
updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest
|
2021-07-27 15:11:58 +02:00 |
Claudio Atzori
|
998b66855a
|
updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest
|
2021-07-27 15:11:37 +02:00 |
Miriam Baglioni
|
74f801b689
|
mergin with branch beta
|
2021-07-27 13:18:31 +02:00 |
Miriam Baglioni
|
35e395eae8
|
merge with master
|
2021-07-27 12:34:59 +02:00 |
Miriam Baglioni
|
eb07f7f40f
|
Hosted By Map
|
2021-07-27 12:27:26 +02:00 |
Sandro La Bruzzo
|
848aabbb6c
|
minor fix
|
2021-07-25 12:06:41 +02:00 |
Sandro La Bruzzo
|
8fac10c91e
|
fixed defintion wf of creation final infospace of scholexplorer
|
2021-07-25 11:15:37 +02:00 |
Sandro La Bruzzo
|
3920c69bc8
|
change implementation of resolve Relation to generate jsonRdd in output
|
2021-07-25 09:51:36 +02:00 |
Sandro La Bruzzo
|
d9e3b89937
|
implemented last part of workflows to generate scholixGraph
|
2021-07-23 16:38:32 +02:00 |
Sandro La Bruzzo
|
cfde63a7c3
|
fixed resolve relation join
|
2021-07-23 14:17:29 +02:00 |
Sandro La Bruzzo
|
4a439c3863
|
NPE fixed
|
2021-07-23 14:17:29 +02:00 |
Sandro La Bruzzo
|
ca74e8dd02
|
create a separate wf for resolving relation
|
2021-07-23 11:40:06 +02:00 |
Sandro La Bruzzo
|
43e9380cd3
|
update resolve relation to use the same format of openaire graph
|
2021-07-23 11:25:18 +02:00 |
Sandro La Bruzzo
|
62ae36a3d2
|
fixed NPE
|
2021-07-22 15:41:38 +02:00 |
Sandro La Bruzzo
|
31d2d6d41e
|
Scholexplorer: introduction of dedup openaire
|
2021-07-21 18:09:32 +02:00 |
Alessia Bardi
|
9069958479
|
tests for enermaps
|
2021-07-20 19:31:43 +02:00 |
Claudio Atzori
|
65934888a1
|
adding record identifier among the originalIds regardless of what IdentifierFactory produces
|
2021-07-19 17:52:52 +02:00 |
Claudio Atzori
|
5947cddafc
|
adding record identifier among the originalIds regardless of what IdentifierFactory produces
|
2021-07-19 17:52:24 +02:00 |
Claudio Atzori
|
0977baf41d
|
contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph
|
2021-07-19 17:43:52 +02:00 |
Claudio Atzori
|
5e5f65a3c3
|
contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph
|
2021-07-19 15:56:55 +02:00 |
Sandro La Bruzzo
|
7e2caafe84
|
Scholexplorer: fixed mapping typologies
|
2021-07-15 09:53:12 +02:00 |
Miriam Baglioni
|
774cdb190e
|
changes to mirror the last dump of the graph with the ols data model.
|
2021-07-13 18:57:24 +02:00 |
Miriam Baglioni
|
886617afd0
|
One result linked to more than on project is saved just once
|
2021-07-13 18:15:35 +02:00 |
Miriam Baglioni
|
320cf02d96
|
Changed the way to find results linked to projects. We verify to actually have the project on the graph before selecting the result
|
2021-07-13 18:13:32 +02:00 |
Miriam Baglioni
|
52ce35d57b
|
-
|
2021-07-13 18:08:46 +02:00 |
Miriam Baglioni
|
970b387b8d
|
modification to allow dump of a single community
|
2021-07-13 18:08:10 +02:00 |
Miriam Baglioni
|
eae10c5894
|
modification to allow the dump for a single community
|
2021-07-13 18:07:25 +02:00 |
Miriam Baglioni
|
c028feef4f
|
workflow for the dump as sub workflows
|
2021-07-13 18:06:44 +02:00 |
Miriam Baglioni
|
d70f8c96fd
|
funding contains and not starts with h2020
|
2021-07-13 17:34:53 +02:00 |
Miriam Baglioni
|
5e38c7f42d
|
dumping only communities with status all
|
2021-07-13 17:32:38 +02:00 |
Miriam Baglioni
|
618d2de2da
|
minor changes and refactoring
|
2021-07-13 17:10:02 +02:00 |
Miriam Baglioni
|
59615da65e
|
Add test to verify the creation of relation between context and projects
|
2021-07-13 17:09:15 +02:00 |
Miriam Baglioni
|
084b4ef999
|
added the creation of the openaireId from funder and grant number if the element is not present in the context profile
|
2021-07-13 17:07:46 +02:00 |
Miriam Baglioni
|
8f322a73cb
|
change because of the renaming of originalId in acronym
|
2021-07-13 16:22:58 +02:00 |
Miriam Baglioni
|
72397ea1ba
|
Added fix for community of arbitrary name length
|
2021-07-13 16:18:35 +02:00 |
Miriam Baglioni
|
5295d10691
|
added check not to dump deletedByInference entities
|
2021-07-13 16:11:46 +02:00 |
Miriam Baglioni
|
e9a17ec899
|
added check to verify not to add void APC
|
2021-07-13 15:53:35 +02:00 |
Miriam Baglioni
|
8429aed6c6
|
Added resource for testing selection of valid relations
|
2021-07-13 15:49:38 +02:00 |
Miriam Baglioni
|
39b1a6edf6
|
added test class for the selection of valid relations and description
|
2021-07-13 15:23:09 +02:00 |
Miriam Baglioni
|
9a58f1b93d
|
added logic to select only the valid relations: those not deletedbyinference and having both part of the relation as entities in the graph
|
2021-07-13 15:20:39 +02:00 |
Miriam Baglioni
|
13c66e16be
|
changed logic to split for communities
|
2021-07-13 15:15:27 +02:00 |
Miriam Baglioni
|
6410ab71d8
|
added APC in the dump and test method
|
2021-07-13 15:13:58 +02:00 |
Miriam Baglioni
|
65a242646d
|
added resource for APC dump
|
2021-07-13 14:45:25 +02:00 |
Miriam Baglioni
|
4b432fbee8
|
extended test class
|
2021-07-13 14:40:39 +02:00 |
Miriam Baglioni
|
87a6e2b967
|
extended test class
|
2021-07-13 14:38:28 +02:00 |
Miriam Baglioni
|
69fd40fd30
|
modified code to split the Croatian funder
|
2021-07-13 14:35:26 +02:00 |
Miriam Baglioni
|
86e50f7311
|
modified code to split the Croatian funder
|
2021-07-13 14:31:45 +02:00 |
Miriam Baglioni
|
da88c850c6
|
changed the logic to verify if a community is contained in the list of context of a result
|
2021-07-13 14:22:44 +02:00 |
Miriam Baglioni
|
2f66fedfec
|
changed the logic to verify if a community is contained in the list of context of a result
|
2021-07-13 14:22:23 +02:00 |
Sandro La Bruzzo
|
bbe8193930
|
merged stable ids
|
2021-07-12 17:00:43 +02:00 |
Sandro La Bruzzo
|
09fccf8000
|
added workflow to serialize scholix and summary in json
|
2021-07-09 11:01:42 +02:00 |
Sandro La Bruzzo
|
0ea576745f
|
updated CreateInputGraph because ggenerics don't work on Spark Dataset
|
2021-07-09 10:29:24 +02:00 |
Sandro La Bruzzo
|
cd17e19044
|
implemented branch workflow to import datacite and crossref in scholexplorer
|
2021-07-08 21:20:19 +02:00 |
Sandro La Bruzzo
|
8a034e46e1
|
updated baseline workflow
|
2021-07-08 11:11:41 +02:00 |
Claudio Atzori
|
b7b8e0986e
|
[raw_all] The claim merge procedure includes the claimed contexts in the merged result
|
2021-07-08 10:42:31 +02:00 |
Sandro La Bruzzo
|
0799ac9fb6
|
fixed wrong path
|
2021-07-08 10:36:37 +02:00 |
Sandro La Bruzzo
|
4d53402712
|
extended ebiLinks to create a dataset before generation of OAF
|
2021-07-08 10:26:21 +02:00 |
Sandro La Bruzzo
|
a4a54a3786
|
code refactor
|
2021-07-08 09:08:25 +02:00 |
Sandro La Bruzzo
|
a01dbe0ab0
|
completed workflow of generation of scholix and summaries
|
2021-07-07 23:10:34 +02:00 |
Claudio Atzori
|
fdcff42e46
|
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
|
2021-07-07 19:01:59 +02:00 |
Claudio Atzori
|
bc014023c8
|
Merge pull request 'to solve the scala SI-3623' (#122) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
Reviewed-on: D-Net/dnet-hadoop#122
|
2021-07-07 11:13:51 +02:00 |
Claudio Atzori
|
32bdfdccbc
|
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
|
2021-07-07 11:08:27 +02:00 |
Claudio Atzori
|
f580cb77e1
|
added mapping for claim relation 'resultResult_publicationDataset_isRelatedTo' (present on BETA)
|
2021-07-06 21:11:11 +02:00 |
Sandro La Bruzzo
|
8535506c22
|
added scholix generation
|
2021-07-06 17:18:06 +02:00 |
Sandro La Bruzzo
|
4c54bd8742
|
add test to verify merge scholix on source
|
2021-07-06 11:32:14 +02:00 |
Andreas Czerniak
|
3531802710
|
to solve the scala SI-3623
|
2021-07-06 11:30:56 +02:00 |
Sandro La Bruzzo
|
7d8db2eb8a
|
betterRenamingMethod
|
2021-07-06 09:56:32 +02:00 |
Sandro La Bruzzo
|
c952c8d236
|
generate first side of scholix mapping
|
2021-07-06 09:53:14 +02:00 |
Sandro La Bruzzo
|
e4b84ef5d6
|
fixed mapping OAF to Scholix summary
|
2021-07-02 16:48:48 +02:00 |
Sandro La Bruzzo
|
c6fa8598e1
|
massive code refactor:
removed modules dhp-*-scholexplorer
|
2021-07-01 22:13:45 +02:00 |
Sandro La Bruzzo
|
84b834c893
|
added test dataset test for pangaea
|
2021-06-30 17:31:09 +02:00 |
Sandro La Bruzzo
|
1a6b398968
|
implemented Creation of Raw Graph and Resolution
|
2021-06-30 17:27:55 +02:00 |
Sandro La Bruzzo
|
623a0c4edb
|
code Refactor, renaming packages
|
2021-06-30 11:09:30 +02:00 |
Sandro La Bruzzo
|
7e08655e5f
|
added relation dates in all scholexplorer Datasources
|
2021-06-29 12:02:03 +02:00 |
Sandro La Bruzzo
|
075055eaca
|
added relation dates in bio mapping
|
2021-06-29 10:33:09 +02:00 |
Sandro La Bruzzo
|
f36f92287d
|
implemented mapping from Crossref Event Data to Oaf
|
2021-06-29 10:21:23 +02:00 |
Sandro La Bruzzo
|
511ec14c63
|
implemented mapping from EBI and Scholix Resolved to OAF
|
2021-06-28 22:04:22 +02:00 |
Sandro La Bruzzo
|
ad50415167
|
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
|
2021-06-24 17:20:50 +02:00 |
Sandro La Bruzzo
|
80e15cc455
|
implemented mapping from uniprot, pdb and ebi links
|
2021-06-24 17:20:00 +02:00 |
Claudio Atzori
|
2e8fd2c531
|
cleanup
|
2021-06-23 14:38:24 +02:00 |
Claudio Atzori
|
4dc9ebf217
|
[raw_all] fixed unit test
|
2021-06-23 14:38:07 +02:00 |
Claudio Atzori
|
50fc5a64a0
|
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
|
2021-06-23 11:49:42 +02:00 |
Sandro La Bruzzo
|
080a280bea
|
added pdb to Oaf Transformation
|
2021-06-21 16:23:59 +02:00 |
Sandro La Bruzzo
|
507e42102a
|
added pdb to oaf class
|
2021-06-21 09:36:40 +02:00 |
Sandro La Bruzzo
|
4fe7b75644
|
renamed packages
|
2021-06-18 16:41:24 +02:00 |
Sandro La Bruzzo
|
3100166d29
|
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
|
2021-06-16 16:22:16 +02:00 |
Claudio Atzori
|
7243a40c88
|
code formatting
|
2021-06-16 15:03:03 +02:00 |
Sandro La Bruzzo
|
dfcf78cf24
|
removed wrong code
|
2021-06-16 14:57:42 +02:00 |
Sandro La Bruzzo
|
cc0f2b11fb
|
Implemented mapping from pubmed baseline to OAF
|
2021-06-16 14:56:24 +02:00 |
Michele Artini
|
ada063ce70
|
fixed a problem with empty mdstore list (2)
|
2021-06-14 12:04:47 +02:00 |
Michele Artini
|
83132ee99a
|
fixed a problem with empty mdstore list
|
2021-06-14 11:57:00 +02:00 |
Sandro La Bruzzo
|
aeb8132627
|
Merged branch stable_ids
|
2021-06-14 10:07:29 +02:00 |
Claudio Atzori
|
2039bb9f5f
|
orcid / orcid_pending cleaning backported from master branch
|
2021-06-14 09:40:50 +02:00 |
Claudio Atzori
|
dd19c4ac5a
|
Merge pull request 'import_new_mdstores' (#112) from import_new_mdstores into stable_ids
Reviewed-on: D-Net/dnet-hadoop#112
|
2021-06-14 09:23:55 +02:00 |
Claudio Atzori
|
a900bfb874
|
delegating the date parsing to https://github.com/sisyphsu/dateparser
|
2021-06-11 16:53:01 +02:00 |
Sandro La Bruzzo
|
5b724d9972
|
added relations to datacite mapping
|
2021-06-04 10:14:22 +02:00 |
Sandro La Bruzzo
|
e57294ac99
|
implemented changes on PUBMed dataflow
|
2021-06-03 10:52:09 +02:00 |
Michele Artini
|
ede2749822
|
orcid pid type
|
2021-06-01 12:42:43 +02:00 |
Michele Artini
|
f0fbfdcfae
|
Merge branch 'stable_ids' into import_new_mdstores
|
2021-06-01 12:03:00 +02:00 |
Michele Artini
|
e950750262
|
add nodes to import hdfs mdstores
|
2021-06-01 10:48:50 +02:00 |
Michele Artini
|
03a510859a
|
removed coalesce(1)
|
2021-05-31 14:10:51 +02:00 |
Michele Artini
|
e9f2b6037c
|
patch of mdstore records
|
2021-05-31 11:36:26 +02:00 |
Michele Artini
|
ad56a44fda
|
save as gzipped sequence file
|
2021-05-28 14:45:39 +02:00 |
Claudio Atzori
|
6e3a4e9237
|
updated test expectations
|
2021-05-28 09:37:50 +02:00 |
Michele Artini
|
4fa5671d16
|
first implementation of Hdfs Mdstores Importer
|
2021-05-27 16:22:07 +02:00 |
Claudio Atzori
|
5e4b91d9ef
|
more pervasive use of constants from ModelConstants, especially for ORCID
|
2021-05-26 18:20:23 +02:00 |
Claudio Atzori
|
9d725efdc1
|
reverted implementation of the mdstore client
|
2021-05-20 18:26:09 +02:00 |
Claudio Atzori
|
ae5c28e54f
|
code formatting
|
2021-05-20 16:13:06 +02:00 |
Claudio Atzori
|
232dce83db
|
fixes #6701: xpath for titles to support both datacite and Guidelines v4 mapping
|
2021-05-20 14:41:15 +02:00 |
Claudio Atzori
|
23b8883ab1
|
applied intellij code cleanup
|
2021-05-14 10:58:12 +02:00 |
Claudio Atzori
|
d4c3476152
|
mapping datasource.journal only when an issn is available, null otherwhise
|
2021-05-11 11:08:54 +02:00 |
Claudio Atzori
|
d1cbee8413
|
imported methods from CleaningFunctions, defined in GraphCleaningFunctions
|
2021-05-10 16:43:39 +02:00 |
Claudio Atzori
|
d4a30fabe3
|
clean up tests
|
2021-05-05 17:28:15 +02:00 |
Claudio Atzori
|
dccaf173cf
|
fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials
|
2021-05-05 16:36:15 +02:00 |
Claudio Atzori
|
2e1eb96f9a
|
code formatting
|
2021-05-05 11:23:57 +02:00 |
Claudio Atzori
|
fb930b84d3
|
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
|
2021-05-04 18:06:30 +02:00 |
Claudio Atzori
|
923d19ea8e
|
mdstore read lock/unlock when bulk copying records from mongodb to hdfs
|
2021-05-04 18:06:21 +02:00 |
Sandro La Bruzzo
|
714b71bd21
|
updated pubmed
|
2021-05-04 14:54:12 +02:00 |
Alessia Bardi
|
9a20057615
|
fixed query for organisations' pids
|
2021-04-29 15:23:39 +02:00 |
Sandro La Bruzzo
|
2129e9caa7
|
updated pangaea transformation to parse directly the xml
|
2021-04-28 10:21:03 +02:00 |
Claudio Atzori
|
5afa7d3e0c
|
core utilities in dhp-common moved in external module dhp-schemas
|
2021-04-27 15:44:01 +02:00 |
Sandro La Bruzzo
|
74484d2823
|
bug fixing
|
2021-04-27 12:13:44 +02:00 |
Sandro La Bruzzo
|
c74b03d59c
|
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
|
2021-04-27 11:31:07 +02:00 |
Sandro La Bruzzo
|
7f8848ecdd
|
added first implementation of Pangaea Mapping
|
2021-04-27 11:30:37 +02:00 |
Claudio Atzori
|
27ab8a704d
|
adjusted poms to align with the external dhp-schema module
|
2021-04-27 10:12:27 +02:00 |
Claudio Atzori
|
c2bb03c8b5
|
depending on external dhp-schemas module
|
2021-04-23 17:57:35 +02:00 |
Claudio Atzori
|
c25238480c
|
making ODF record parsing namespace unaware (#6629)
|
2021-04-23 17:34:57 +02:00 |
Claudio Atzori
|
d0d477cca3
|
code formatting
|
2021-04-20 12:50:34 +02:00 |
miconis
|
0393cdce42
|
addition of alternative names in export queries
|
2021-04-20 12:45:21 +02:00 |
miconis
|
cadd0a5de8
|
modification of the queries for openorgs: they now consider also pending orgs
|
2021-04-20 12:06:56 +02:00 |
Claudio Atzori
|
d1ca025b0b
|
[cleaning] remiving authors without fullname or providing 'deactivated' keyword. Removing test test titles
|
2021-04-13 14:32:41 +02:00 |
miconis
|
11b22b2d23
|
bug fix in the query, it now exports only relations with non-hidden organizations
|
2021-04-08 11:51:47 +02:00 |
miconis
|
0857100fb8
|
implementation of the tests for the openorgs integration in the openaire provision
|
2021-04-07 18:42:16 +02:00 |
miconis
|
bf685d849f
|
addition of pids in the query for the export of openorgs for the provision, addition of ec_fields in the openorgs model
|
2021-04-07 14:27:43 +02:00 |
miconis
|
eaaefb8b4c
|
implementation of the procedure to reuse content of different dbs when creating the raw graph
|
2021-04-06 14:35:51 +02:00 |
miconis
|
c39c82dfe9
|
modification of the jobs for the integration of openorgs in the provision, dedup records are no more created by merging but simply taking results of openorgs portal
|
2021-04-06 14:31:00 +02:00 |
Claudio Atzori
|
7941d7be29
|
WIP: using common definitions from ModelConstants
|
2021-03-31 18:33:57 +02:00 |
Claudio Atzori
|
72ce741ea6
|
WIP: using common definitions from ModelConstants
|
2021-03-31 17:07:13 +02:00 |
Claudio Atzori
|
9237d55d7f
|
[OpenOrgsWf] cleanup
|
2021-03-29 17:40:34 +02:00 |
Claudio Atzori
|
7f4e9479ec
|
[OpenOrgsWf] graph construction wf: allow to skip the import openorgs node (importOpenorgs true|false)
|
2021-03-29 16:59:16 +02:00 |
miconis
|
2709d08fc2
|
Merge branch 'stable_ids' into openorgswf
|
2021-03-29 16:39:07 +02:00 |
miconis
|
f446580e9f
|
code refactoring (useless classes and wf removed), implementation of the test for the openorgs dedup
|
2021-03-29 16:10:46 +02:00 |
miconis
|
2355cc4e9b
|
minor changes and bug fix
|
2021-03-29 10:07:12 +02:00 |
Claudio Atzori
|
827e7e37db
|
[Cleaning] drop instance.alternateIdentifier elements when they are available among instance.pid
|
2021-03-25 11:07:59 +01:00 |
miconis
|
28c1cdd132
|
merged stable_ids into openorgswf
|
2021-03-25 10:44:49 +01:00 |
miconis
|
348b0ef921
|
bug fix, implementation of the workflow for the creation of raw_organizations (openorgs dedup), addition of the pid lists to the openorgs postgres db
|
2021-03-24 15:51:27 +01:00 |
Claudio Atzori
|
751125fdf9
|
[Actionmanager] zero function considers empty entity.id as well as rel.source/rel.target
|
2021-03-23 17:34:32 +01:00 |
Claudio Atzori
|
b4febed138
|
updated mapping tests as consequence of the special treatment reserved to Handle PIDs
|
2021-03-23 09:37:48 +01:00 |
Claudio Atzori
|
431cbe9955
|
handle missing instance.pid during bulk cleaning
|
2021-03-23 09:28:58 +01:00 |
Sandro La Bruzzo
|
c73072079d
|
fix conflicts
|
2021-03-22 16:36:31 +01:00 |
Claudio Atzori
|
5a043e95ea
|
code formatting
|
2021-03-19 11:37:27 +01:00 |
Claudio Atzori
|
a4e82a65aa
|
integrated filter applied when merging BETA & PROD graphs to rule our records from Datacite
|
2021-03-19 11:34:44 +01:00 |
Claudio Atzori
|
8257f9a2bc
|
result.pid: adjusted the mapping applied to the contents from the aggregator
|
2021-03-17 12:45:38 +01:00 |
Claudio Atzori
|
640b885706
|
added instance.alternativeIdentifiers to the graph model, adjusted the mapping applied to the contents from the aggregator
|
2021-03-16 14:19:32 +01:00 |
Claudio Atzori
|
01630f638d
|
IdentifierFactory implementation based on the list of datasources authoritative for a given pid type
|
2021-03-09 17:11:50 +01:00 |
Claudio Atzori
|
59532b0919
|
[#6281 Provenance of product PIDs] Added PIDs to the Instance type; extended mapping for OAF/ODF records
|
2021-03-09 11:14:45 +01:00 |
Claudio Atzori
|
d525785497
|
[#6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color.
|
2021-03-09 11:12:55 +01:00 |
Claudio Atzori
|
f468c7f0d7
|
merged from master
|
2021-03-09 09:12:41 +01:00 |
Claudio Atzori
|
8d2bb24512
|
merged from master
|
2021-03-08 15:44:34 +01:00 |
Claudio Atzori
|
fa7930d2e2
|
merging contributions from PR#97
|
2021-03-05 15:45:28 +01:00 |
miconis
|
1a85020572
|
bug fix in graph-mapper, changes in the implementation of the openorgs wf to create relations and populate openorgs db
|
2021-02-26 10:19:28 +01:00 |
Claudio Atzori
|
b830e33392
|
mdstore collector plugin
|
2021-02-25 12:30:30 +01:00 |
Claudio Atzori
|
fc3fa5e343
|
implemented mdstore collector plugin
|
2021-02-24 15:07:24 +01:00 |
miconis
|
4b2124a18e
|
implementation of the openorgs wfs, implementation of the raw_all wf to migrate openorgs db entities
|
2021-02-10 11:51:50 +01:00 |
Alessia Bardi
|
c4d1feca74
|
mapper test with validated link to project
|
2021-02-10 11:22:54 +01:00 |
Claudio Atzori
|
72c57b28fa
|
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
|
2021-02-04 14:08:18 +01:00 |
Alessia Bardi
|
c67329d3ad
|
updated test for EU Open Data portal datasets
|
2021-02-03 17:06:48 +01:00 |
Alessia Bardi
|
fd705404a1
|
tests for EU Open Data portal dataset mapping
|
2021-02-03 10:28:17 +01:00 |
Sandro La Bruzzo
|
686e7b507c
|
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into aggregation_on_hadoop
|
2021-01-28 10:02:13 +01:00 |
Sandro La Bruzzo
|
98b9498b57
|
Removed old messaging system not quite used from collection and Transformation workflow
code refactor
|
2021-01-28 09:51:17 +01:00 |
Sandro La Bruzzo
|
150a617bd1
|
Merge pull request 'aggregation_on_hadoop' (#90) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
Wonderfull code... You're the Best Sandro
|
2021-01-26 16:00:47 +01:00 |
Claudio Atzori
|
885e0dd926
|
[Cleaning] filter authors not providing word characters in the fullname
|
2021-01-26 09:48:53 +01:00 |
Claudio Atzori
|
2890511613
|
[Cleaning] normalise missing Result.country
|
2021-01-26 09:41:44 +01:00 |
Claudio Atzori
|
4eb9ed35b1
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2021-01-25 18:12:24 +01:00 |
Claudio Atzori
|
cd379eb5e3
|
[Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname
|
2021-01-25 18:11:49 +01:00 |
Alessia Bardi
|
505477f36f
|
format code
|
2021-01-25 18:02:49 +01:00 |
Alessia Bardi
|
ded6ed8d7d
|
no ',' author, if there are no author in ODF records
|
2021-01-25 17:57:51 +01:00 |
Claudio Atzori
|
3465c8ccee
|
[Cleaning] trying to avoid NPEs
|
2021-01-25 16:54:53 +01:00 |
Sandro La Bruzzo
|
a54848a59c
|
Moved Vocabulary stuff to common module
|
2021-01-25 15:43:04 +01:00 |
Claudio Atzori
|
07a0ccfc96
|
[Cleaning] trying to avoid NPEs
|
2021-01-25 13:36:01 +01:00 |
Claudio Atzori
|
34d653de41
|
[Cleaning] updated cleaning rule for DOIs
|
2021-01-22 14:16:33 +01:00 |
Claudio Atzori
|
26e9d55c13
|
code formatting
|
2021-01-05 09:59:26 +01:00 |
Claudio Atzori
|
7185158942
|
ignore missing properties
|
2020-12-29 11:06:28 +01:00 |
Claudio Atzori
|
28460c2cd1
|
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
|
2020-12-23 16:59:52 +01:00 |
Claudio Atzori
|
723b01f9e9
|
trivial: the less magic numbers and values around, the better
|
2020-12-23 12:22:48 +01:00 |
Claudio Atzori
|
6cb0dc3f43
|
extended OCRID cleaning procedure
|
2020-12-21 11:40:17 +01:00 |
Claudio Atzori
|
47270d9af5
|
lenient mock can be lenient
|
2020-12-18 15:38:59 +01:00 |
Alessia Bardi
|
f9a8fd8bbd
|
updated test record for textgrid
|
2020-12-17 11:59:45 +01:00 |
Michele Artini
|
991e675dc6
|
validation in claim rels
|
2020-12-14 15:41:25 +01:00 |
Claudio Atzori
|
12e2f930c8
|
resolved conflicts
|
2020-12-10 10:57:39 +01:00 |
Alessia Bardi
|
112da6d76a
|
in theory, just auto-formatting after mvn compile
|
2020-12-09 20:00:27 +01:00 |
Alessia Bardi
|
bece04b330
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-12-09 19:54:43 +01:00 |
Alessia Bardi
|
426b76ee8e
|
more asserts for TextGrid record
|
2020-12-09 19:46:11 +01:00 |
Claudio Atzori
|
4705144918
|
Merge pull request 'rel_project_validation' (#69) from rel_project_validation into master
LGTM
|
2020-12-09 19:01:20 +01:00 |
Claudio Atzori
|
ada21ad920
|
Merge pull request 'dump of the results related to at least one project' (#61) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
|
2020-12-09 17:22:56 +01:00 |
Michele Artini
|
1bc9adc10d
|
default trust for validated rels
|
2020-12-09 16:18:37 +01:00 |
Michele Artini
|
5f21a356fd
|
reindent
|
2020-12-09 11:24:30 +01:00 |
Michele Artini
|
370a5e650b
|
validation attributes in resultProject relations
|
2020-12-09 11:18:26 +01:00 |
Claudio Atzori
|
a104a632df
|
cleanup
|
2020-12-04 16:32:47 +01:00 |
Miriam Baglioni
|
5fb65ffc4a
|
merge branch with master
|
2020-12-03 11:24:35 +01:00 |
Miriam Baglioni
|
ea88dc3401
|
fixed issue in property name
|
2020-12-03 11:24:23 +01:00 |
Claudio Atzori
|
cfb55effd9
|
code formatting
|
2020-12-02 11:23:49 +01:00 |
Claudio Atzori
|
57f448b7a4
|
graph cleaning workflow separate orcid_pending from orcid, depending on the author pid provenance
|
2020-12-02 10:44:05 +01:00 |
Alessia Bardi
|
a417624670
|
tests for raw graph mapping
|
2020-12-02 10:15:26 +01:00 |
Claudio Atzori
|
893ac4a77b
|
GenerateEntitiesApplication can be configured to hash the id value or not
|
2020-12-02 09:30:06 +01:00 |
Claudio Atzori
|
2c407e775e
|
GenerateEntitiesApplication can be configured to hash the id value or not
|
2020-11-30 12:00:38 +01:00 |
Claudio Atzori
|
e731a7658d
|
cleaning texts to remove tab characters too
|
2020-11-27 09:00:04 +01:00 |
Claudio Atzori
|
c1b9a4045a
|
grouping of records will be performed by the dedup workflow
|
2020-11-26 10:59:10 +01:00 |
Miriam Baglioni
|
124591a7f3
|
refactoring
|
2020-11-25 18:23:28 +01:00 |
Miriam Baglioni
|
1a89f8211c
|
D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:12:40 +01:00 |
Miriam Baglioni
|
5fbe54ef54
|
D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:10:28 +01:00 |
Miriam Baglioni
|
ed01e5a5e1
|
D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:09:34 +01:00 |
Miriam Baglioni
|
d4ddde2ef2
|
changed because of D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:01:01 +01:00 |
Miriam Baglioni
|
f5e5e92a10
|
changed because of D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 17:58:53 +01:00 |
Miriam Baglioni
|
1df94b85b4
|
changed because of D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 17:57:43 +01:00 |
Claudio Atzori
|
dfd6205b95
|
Consistency graph workflow merges all the entities by ID
|
2020-11-25 14:55:32 +01:00 |
Miriam Baglioni
|
90d4369fd2
|
added test to verify the compression in writing community info on hdfs
|
2020-11-25 14:34:58 +01:00 |
Miriam Baglioni
|
6750e33d69
|
merge branch with master
|
2020-11-25 14:09:01 +01:00 |
Miriam Baglioni
|
b2c455f883
|
added java doc
|
2020-11-25 14:08:09 +01:00 |
Miriam Baglioni
|
1f130cdf92
|
changed the relation (produces -> isProducedBy) due to the change in the code
|
2020-11-25 14:04:26 +01:00 |
Miriam Baglioni
|
e758d5d9b4
|
refactoring
|
2020-11-25 13:46:39 +01:00 |
Miriam Baglioni
|
87a9f616ae
|
refactoring and addition of the funder nsp first part as nome for the dump insteasd of the whole nsp
|
2020-11-25 13:45:41 +01:00 |
Miriam Baglioni
|
e7e418e444
|
added decision node to verify if to upload in Zenodo
|
2020-11-25 13:44:10 +01:00 |
Miriam Baglioni
|
305e3d0c9c
|
added resource file for relation with relClass = isProducedBy
|
2020-11-25 13:43:41 +01:00 |
Miriam Baglioni
|
21ce175d17
|
added FilterFunction specification if filter operation
|
2020-11-25 13:42:31 +01:00 |
Miriam Baglioni
|
bde6d337dd
|
test classes for dump of results related to funders
|
2020-11-25 13:42:01 +01:00 |
Miriam Baglioni
|
b37b9352d7
|
added constant value for semantic relationship between projects and results
|
2020-11-25 13:41:08 +01:00 |
Claudio Atzori
|
36173c13a5
|
reverted filters in the clening process
|
2020-11-25 10:24:42 +01:00 |
Claudio Atzori
|
eeebd5a920
|
Cleanig workflow: remove newlines from titles, descriptions, subjects
|
2020-11-24 18:40:25 +01:00 |
Claudio Atzori
|
e1a1bb3ee4
|
moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects
|
2020-11-24 18:34:03 +01:00 |
Miriam Baglioni
|
72bb0fe360
|
changed directory name
|
2020-11-24 16:47:07 +01:00 |
Miriam Baglioni
|
39f4a20873
|
chenged the path and the name for saving the communities_infrastructures dump file
|
2020-11-24 14:47:32 +01:00 |
Miriam Baglioni
|
7e14452a87
|
final versione of the wf to get the dump of results associated to at least one funder per funder
|
2020-11-24 14:46:34 +01:00 |
Miriam Baglioni
|
c167a18057
|
added new parameter for the dumpType
|
2020-11-24 14:45:50 +01:00 |
Miriam Baglioni
|
54a309bb6b
|
refactoring
|
2020-11-24 14:45:30 +01:00 |
Miriam Baglioni
|
35ecea8842
|
changed to consider the modification for the specification of the type of dump
|
2020-11-24 14:45:15 +01:00 |
Miriam Baglioni
|
b9b6bdb2e6
|
fixing issue on previous implementation
|
2020-11-24 14:44:53 +01:00 |
Miriam Baglioni
|
7e940f1991
|
changed to consider the modification for the specification of the type of dump
|
2020-11-24 14:43:34 +01:00 |
Miriam Baglioni
|
62928ef7a5
|
changed to save the communities_infrastructures information as the other entity dumps: in a json.gz file
|
2020-11-24 14:42:41 +01:00 |
Claudio Atzori
|
33bae02451
|
reverted behaviour of the cleaning workflow: grouping entities by ID will be managed differently
|
2020-11-24 14:42:33 +01:00 |
Miriam Baglioni
|
3319440c53
|
changed the direction of the relation between projects and result considered to select the results linked to projects
|
2020-11-24 14:41:09 +01:00 |
Miriam Baglioni
|
00c377dac2
|
added specification of MapFunction types in map
|
2020-11-24 14:40:22 +01:00 |
Miriam Baglioni
|
44db258dc4
|
added enumerated for the dump type
|
2020-11-24 14:38:06 +01:00 |
Miriam Baglioni
|
1832708c42
|
modified boolean variable with string one whcih specify the type of dump we are performing: complete, community or funder
|
2020-11-24 14:37:36 +01:00 |
Miriam Baglioni
|
259c67ce36
|
fixed issue in path name
|
2020-11-20 12:32:23 +01:00 |
Miriam Baglioni
|
0a9db67eec
|
-
|
2020-11-20 12:21:33 +01:00 |
Miriam Baglioni
|
d362f2637d
|
merge branch with master
|
2020-11-19 19:17:20 +01:00 |
Miriam Baglioni
|
cf3f47563f
|
new parameter files
|
2020-11-19 19:16:05 +01:00 |
Miriam Baglioni
|
24c56fa7a3
|
new logic and workflow for dump of results with link to projects. In this implementation the result match the model of the communityresult.
|
2020-11-19 19:15:39 +01:00 |
Claudio Atzori
|
fcbb05eb21
|
cleanup
|
2020-11-19 15:14:33 +01:00 |
Claudio Atzori
|
3f34757c63
|
merged from master
|
2020-11-19 14:34:54 +01:00 |
Miriam Baglioni
|
fafb688887
|
-
|
2020-11-18 18:56:48 +01:00 |
Miriam Baglioni
|
906db690d2
|
-
|
2020-11-18 17:43:08 +01:00 |
Claudio Atzori
|
ede7fae6c8
|
Merge pull request 'XML record indexing test' (#58) from provision_indexing into master
|
2020-11-18 17:04:34 +01:00 |
Miriam Baglioni
|
5402062ff5
|
changed parameter file with the ono associated to the job
|
2020-11-18 16:58:20 +01:00 |
Miriam Baglioni
|
a172a37ad1
|
fixed typo
|
2020-11-18 16:55:07 +01:00 |
Miriam Baglioni
|
46ba3793f6
|
code, workflow and parameters for the dump of the results associated to funders
|
2020-11-18 16:47:31 +01:00 |
Miriam Baglioni
|
57cac36898
|
changed the workflow name
|
2020-11-18 13:38:03 +01:00 |
Claudio Atzori
|
8177ce7939
|
test for XmlIndexingJob based on a local miniSolrCluster
|
2020-11-18 10:58:05 +01:00 |
Alessia Bardi
|
10e673660f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-18 10:01:23 +01:00 |
Alessia Bardi
|
be7b310cef
|
rel semantcis ignore case
|
2020-11-18 10:01:20 +01:00 |
Michele Artini
|
33da2e3d6c
|
xpaths for dateOfCollection and dateOfTransformation
|
2020-11-18 09:26:20 +01:00 |
Alessia Bardi
|
8f87020a50
|
#56: map relevantDates from aggregated ODF records
|
2020-11-17 18:42:09 +01:00 |
Alessia Bardi
|
7e0a76a8ac
|
test fr TextGrid
|
2020-11-17 18:39:25 +01:00 |
Claudio Atzori
|
cfc01f136e
|
PID filtering based on a blacklist
|
2020-11-17 12:27:06 +01:00 |
Claudio Atzori
|
6ab1ce53c9
|
fixed condition in result pid cleaning; cleanup
|
2020-11-16 10:09:17 +01:00 |
Claudio Atzori
|
4de8c8b237
|
fixed workflow variable name
|
2020-11-16 10:03:11 +01:00 |
Claudio Atzori
|
331d621800
|
added test resource
|
2020-11-14 12:16:15 +01:00 |
Claudio Atzori
|
5d4e34e26a
|
fixed typo in variable name
|
2020-11-14 10:32:26 +01:00 |
Claudio Atzori
|
768bc5304c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-13 15:40:34 +01:00 |
Claudio Atzori
|
93f7b7974f
|
Merge pull request 'trust truncated to 3 decimals' (#24) from trunc_trust into master
LGTM
|
2020-11-13 15:40:02 +01:00 |
Claudio Atzori
|
528231a287
|
grouping graph entities by id turned out to be an easy extension for the already existing cleaning workflow
|
2020-11-13 15:37:48 +01:00 |
Claudio Atzori
|
2bed29eb09
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:12 +01:00 |
Claudio Atzori
|
13e36a4da0
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:02 +01:00 |
Claudio Atzori
|
9b0fb9e958
|
merged from master
|
2020-11-12 09:27:12 +01:00 |
Michele Artini
|
40160d171f
|
organizations pids
|
2020-11-09 12:58:36 +01:00 |
Sandro La Bruzzo
|
027ef2326c
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-11-06 17:12:42 +01:00 |
Sandro La Bruzzo
|
cd27df91a1
|
fixed bug on missing relation in ANDS
|
2020-11-06 17:12:31 +01:00 |
Claudio Atzori
|
d10447e747
|
re-packaged graph dump workflow sources
|
2020-11-05 17:38:18 +01:00 |
Claudio Atzori
|
2d76497488
|
cleanup
|
2020-11-05 17:10:24 +01:00 |
Miriam Baglioni
|
f8e9bda24c
|
merge branch with master
|
2020-11-05 16:31:18 +01:00 |
Miriam Baglioni
|
be5ed8f554
|
added check to avoid sending empty metadata.
|
2020-11-05 16:10:17 +01:00 |
Claudio Atzori
|
2148a51fae
|
minor changes
|
2020-11-05 11:24:12 +01:00 |
Claudio Atzori
|
4625b7486e
|
code formatting
|
2020-11-04 18:12:43 +01:00 |
Miriam Baglioni
|
e9ac471ae9
|
removed dependency from classes for the pid graph dump
|
2020-11-04 18:04:42 +01:00 |
Miriam Baglioni
|
b90a945c49
|
removed property files for pid graph dump
|
2020-11-04 17:28:33 +01:00 |
Miriam Baglioni
|
bac307155a
|
removed properties specific for pid graph dump
|
2020-11-04 17:28:04 +01:00 |
Miriam Baglioni
|
9c9d50f486
|
removed code specific for pid graph dump
|
2020-11-04 17:26:22 +01:00 |
Miriam Baglioni
|
5669890934
|
removed commented lines
|
2020-11-04 17:15:21 +01:00 |
Miriam Baglioni
|
6a89f59be9
|
removed commented lines
|
2020-11-04 17:13:59 +01:00 |
Miriam Baglioni
|
56150d7e5e
|
removed all code related to the dump of pids graph
|
2020-11-04 17:13:12 +01:00 |
Miriam Baglioni
|
16c54a96f8
|
removed pid dump
|
2020-11-04 17:11:32 +01:00 |
Miriam Baglioni
|
0cac5436ff
|
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
|
2020-11-04 13:21:11 +01:00 |
Alessia Bardi
|
51808b5afd
|
Updated descriptions
|
2020-11-04 12:29:48 +01:00 |
Alessia Bardi
|
e6becf8659
|
Updated descriptions
|
2020-11-04 12:17:57 +01:00 |
Alessia Bardi
|
0abe0eee33
|
Updated descriptions
|
2020-11-04 12:15:30 +01:00 |
Alessia Bardi
|
f6ab238f5d
|
Updated descriptions
|
2020-11-04 11:50:47 +01:00 |
Miriam Baglioni
|
c010a8442f
|
fixed issue on test code
|
2020-11-03 17:26:51 +01:00 |
Miriam Baglioni
|
8ec7a61188
|
merge branch with master
|
2020-11-03 16:59:08 +01:00 |
Miriam Baglioni
|
c209284ca7
|
new schemas for the entities in the dump with added descriptions
|
2020-11-03 16:58:08 +01:00 |
Miriam Baglioni
|
08806deddf
|
added the splitSize non mandatory parameter. Default size 10G
|
2020-11-03 16:57:34 +01:00 |
Miriam Baglioni
|
7d2eda43ca
|
added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase
|
2020-11-03 16:57:01 +01:00 |
Miriam Baglioni
|
cbbb1bdc54
|
moved business logic to new class in common for handling the zip of hte archives
|
2020-11-03 16:55:50 +01:00 |
Miriam Baglioni
|
d4382b54df
|
moved the tar archive with maz size on common module
|
2020-11-03 16:54:50 +01:00 |
Claudio Atzori
|
86d6fbe95b
|
refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong
|
2020-11-03 12:19:46 +01:00 |
Claudio Atzori
|
8471888ad3
|
Merge branch 'graph_cleaning' into stable_ids
|
2020-11-03 11:52:47 +01:00 |
Claudio Atzori
|
5310e56dba
|
remove empy PIDs
|
2020-11-03 11:52:10 +01:00 |
Claudio Atzori
|
3fcd669e99
|
result merge operation leverage on custom ResultTypeComparator in the aggregator graph construction
|
2020-11-03 10:53:23 +01:00 |
Claudio Atzori
|
09e44dabff
|
Merge branch 'master' into stable_ids
|
2020-11-02 12:16:01 +01:00 |
Sandro La Bruzzo
|
754c86f33e
|
fixed test to work on jenkins
|
2020-11-02 09:35:01 +01:00 |
Miriam Baglioni
|
dabb33e018
|
changed the discriminant for which split the file
|
2020-10-30 17:52:22 +01:00 |
Miriam Baglioni
|
0fba08eae4
|
max allowed size per file 10 Gb
|
2020-10-30 16:05:55 +01:00 |
Claudio Atzori
|
4ca75d6951
|
Merge pull request 'Dedup ID creation policy' (#48) from deduptesting into stable_ids
|
2020-10-30 15:15:32 +01:00 |
Miriam Baglioni
|
b828587252
|
prevent the code to cicle indefinetly
|
2020-10-30 15:01:25 +01:00 |
Miriam Baglioni
|
f747e303ac
|
classes for dumping of the graph as ttl file
|
2020-10-30 14:13:45 +01:00 |
Miriam Baglioni
|
16baf5b69e
|
formatting
|
2020-10-30 14:13:14 +01:00 |
Miriam Baglioni
|
a9eef9c852
|
added check for possible Optional value in relation dataInfo
|
2020-10-30 14:12:28 +01:00 |
Miriam Baglioni
|
5f4de9a962
|
formatting
|
2020-10-30 14:11:40 +01:00 |
Miriam Baglioni
|
14bf2e7238
|
added option to split dumps bigger that 40Gb on different files
|
2020-10-30 14:09:04 +01:00 |
Claudio Atzori
|
58f28296ea
|
ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values
|
2020-10-30 10:56:42 +01:00 |
Miriam Baglioni
|
78fdb11c3f
|
merge branch with master
|
2020-10-29 12:55:22 +01:00 |
Sandro La Bruzzo
|
1d9fdb7367
|
fixed spark memory issue in SparkSplitOafTODLIEntities
|
2020-10-28 12:30:32 +01:00 |
Miriam Baglioni
|
d2374e3b9e
|
added code to handle cases where the funding tree is not existing
|
2020-10-27 16:15:21 +01:00 |
Miriam Baglioni
|
5d3012eeb4
|
changed code to dump only the programme list and not the classification list
|
2020-10-27 16:14:18 +01:00 |
Miriam Baglioni
|
3241ec1777
|
added connection timeout and socket timeout 600 sec
|
2020-10-27 16:12:11 +01:00 |
Alessia Bardi
|
1425d810a8
|
testing mapping
|
2020-10-19 17:46:14 +02:00 |
Claudio Atzori
|
266bf1a221
|
common IdentifierFactory in use on the mapping from the aggregator data; merge the entities sharing the same id; code formatting
|
2020-10-16 17:02:10 +02:00 |
Claudio Atzori
|
34f1d0904b
|
common IdentifierFactory in use on the mapping from the aggregator data
|
2020-10-16 16:00:19 +02:00 |
Sandro La Bruzzo
|
fed711da80
|
Merge remote-tracking branch 'origin/master' into merge_record_to_common
|
2020-10-13 15:32:45 +02:00 |
Alessia Bardi
|
8775a64bc1
|
Merge pull request 'Merging different compatibility levels (pinocchio operator)' (#47) from merge_graph into master
|
2020-10-09 14:44:52 +02:00 |
Claudio Atzori
|
e751c1402f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-10-09 13:53:21 +02:00 |
Claudio Atzori
|
b961dc7d1e
|
added originalid to the fields in the result graph view
|
2020-10-09 13:53:15 +02:00 |
Sandro La Bruzzo
|
eec418cd26
|
moved AuthoreMerger into dhp-common
|
2020-10-08 10:33:55 +02:00 |
Sandro La Bruzzo
|
fe0a7870e6
|
Added test to check if merge authors works
|
2020-10-08 10:33:12 +02:00 |
Sandro La Bruzzo
|
cd9c377d18
|
adpted scholexplorer Dump generation to the new Dataset definition
|
2020-10-08 10:10:13 +02:00 |
Claudio Atzori
|
a3f37a9414
|
javadoc
|
2020-10-07 16:44:22 +02:00 |
Claudio Atzori
|
8d85a2fced
|
[BETA wf only] datasources involved in the merge operation doesn't obey to the infra precedence policy, but relies on a custom behaviour that, given two datasources from beta and prod returns the one from prod with the highest compatibility among the two
|
2020-10-07 16:28:52 +02:00 |
Miriam Baglioni
|
ae08b3c0dd
|
merge branch with master
|
2020-10-05 11:35:55 +02:00 |
Miriam Baglioni
|
11b7eaae09
|
changed the name of the folder where to store the context entity from context to communities_infrastructures
|
2020-10-05 11:24:54 +02:00 |
Miriam Baglioni
|
32bffb0134
|
changed the name from communities_infrastructures to communities_infrastuctures.json
|
2020-10-05 11:24:17 +02:00 |
Miriam Baglioni
|
25cbcf6114
|
changed to solve issues about names. context renamed communities_infrastructure.json and removed the double json.gz extention to the name of the part in the tar
|
2020-10-02 12:17:46 +02:00 |
Claudio Atzori
|
49ae3450a9
|
code formatting
|
2020-10-02 09:43:24 +02:00 |
Claudio Atzori
|
c2a6e2a9bf
|
fixed mapping for datasource journal info (ISSNs)
|
2020-10-02 09:37:08 +02:00 |
Miriam Baglioni
|
01117a46e1
|
whole workflow activated
|
2020-10-01 17:19:21 +02:00 |
Miriam Baglioni
|
cfb5766c6b
|
removed double json.gz from names of files in the tar
|
2020-10-01 17:18:34 +02:00 |
Miriam Baglioni
|
fcaedac980
|
merge branch with master
|
2020-10-01 16:46:59 +02:00 |
Miriam Baglioni
|
c6e6ed1bd8
|
merge branch with master
|
2020-10-01 16:24:41 +02:00 |
Claudio Atzori
|
2e9e13444d
|
author pids made unique by value
|
2020-10-01 12:50:40 +02:00 |
Claudio Atzori
|
e265c3e125
|
cleaning functions factored out in a dedicated class
|
2020-10-01 10:50:15 +02:00 |
Claudio Atzori
|
4287164aba
|
include relevantdate field in the result view
|
2020-10-01 10:28:55 +02:00 |
Miriam Baglioni
|
7b6a7333e6
|
merge branch with master
|
2020-09-25 16:42:07 +02:00 |
Miriam Baglioni
|
983a12ed15
|
temporary modification to allow the upload of files in the sandbox without the neew to recreate the mapping from scratch
|
2020-09-25 16:41:51 +02:00 |
Miriam Baglioni
|
8b36d19182
|
added property depositionId and chenage property newVersion that became string from boolean to handle the three possible distinct values
|
2020-09-25 16:41:15 +02:00 |
Miriam Baglioni
|
ed5239f9ec
|
added new code to handle the new possibility to upload files to an already open deposition
|
2020-09-25 16:34:32 +02:00 |
Miriam Baglioni
|
3a8c524fce
|
refactor
|
2020-09-25 16:34:02 +02:00 |
Miriam Baglioni
|
54800fb9b0
|
enabled only the step to upload in zenodo
|
2020-09-25 14:40:22 +02:00 |
Miriam Baglioni
|
de6c4d46d8
|
fixed conflicts
|
2020-09-24 15:35:01 +02:00 |
Claudio Atzori
|
044d3a0214
|
fixed query used to load datasources in the Graph
|
2020-09-24 13:48:58 +02:00 |
Claudio Atzori
|
27df1cea6d
|
code formatting
|
2020-09-24 12:16:00 +02:00 |
Claudio Atzori
|
fb22f4d70b
|
included values for projects fundedamount and totalcost fields in the mapping tests. Swapped expected and actual values in junit test assertions
|
2020-09-24 12:10:59 +02:00 |
Claudio Atzori
|
42f55395c8
|
fixed order of the ISSNs returned by the SQL query
|
2020-09-24 12:09:58 +02:00 |
Claudio Atzori
|
9a7e72d528
|
using concat_ws to join textual columns from PSQL. When using || to perform the concatenation, Null columns makes the operation result to be Null
|
2020-09-24 10:42:47 +02:00 |
Claudio Atzori
|
9e3e93c6b6
|
setting the correct issn type in the datasource.journal element
|
2020-09-24 10:39:16 +02:00 |
Miriam Baglioni
|
39eb8ab25b
|
changed the dump to move from h2020programme to h2020classification
|
2020-09-23 17:33:00 +02:00 |
Miriam Baglioni
|
c2b5c780ff
|
-
|
2020-09-14 14:34:03 +02:00 |
Miriam Baglioni
|
e2ceefe9be
|
-
|
2020-09-14 14:33:28 +02:00 |
Miriam Baglioni
|
1f893e63dc
|
-
|
2020-09-14 14:33:10 +02:00 |
Claudio Atzori
|
8a523474b7
|
code formatting
|
2020-09-07 11:40:16 +02:00 |
Miriam Baglioni
|
b72a7dad46
|
resuorce for pid graph dump
|
2020-08-24 17:09:01 +02:00 |
Miriam Baglioni
|
8694bb9b31
|
refactoring due to compilation
|
2020-08-24 17:07:34 +02:00 |
Miriam Baglioni
|
8a069a4fea
|
-
|
2020-08-24 17:01:30 +02:00 |
Miriam Baglioni
|
34fa96f3b1
|
-
|
2020-08-24 17:00:20 +02:00 |
Miriam Baglioni
|
5fb2949cb8
|
added utils methods
|
2020-08-24 17:00:09 +02:00 |
Miriam Baglioni
|
2a540b6c01
|
added constants for the pid graph dump
|
2020-08-24 16:55:35 +02:00 |
Miriam Baglioni
|
da103c399a
|
resources for the pid graph dump test
|
2020-08-24 16:52:07 +02:00 |
Miriam Baglioni
|
630a6a1fe7
|
first tests for the pid graph dump
|
2020-08-24 16:51:26 +02:00 |
Miriam Baglioni
|
40c8d2de7b
|
test resources for the dump of the pids graph
|
2020-08-24 16:50:39 +02:00 |
Miriam Baglioni
|
bef79d3bdf
|
first attempt to the dump of pids graph
|
2020-08-24 16:49:38 +02:00 |
Miriam Baglioni
|
85203c16e3
|
merge branch with master
|
2020-08-19 11:49:03 +02:00 |
Miriam Baglioni
|
2c783793ba
|
removed the affiliation from the author to mirror the changes in the model
|
2020-08-19 11:48:12 +02:00 |
Miriam Baglioni
|
f6bf888016
|
removed affiliation from author to mirror the changes in the model
|
2020-08-19 11:41:41 +02:00 |
Miriam Baglioni
|
66d0e0d3f2
|
-
|
2020-08-19 11:31:50 +02:00 |
Miriam Baglioni
|
1c593a9cfe
|
-
|
2020-08-19 11:29:51 +02:00 |
Miriam Baglioni
|
e42b2f5ae2
|
-
|
2020-08-19 11:29:09 +02:00 |
Miriam Baglioni
|
f81ee22418
|
changed to mirror the changes in the model (Instance, CommunityInstance, GraphResult)
|
2020-08-19 11:28:26 +02:00 |
Miriam Baglioni
|
387be43fd4
|
changed to discriminate if dumping all the results type together or each one in its own archive
|
2020-08-19 11:25:27 +02:00 |
Miriam Baglioni
|
c5858afb88
|
added parameter to guide the dump for the result (resultAggregation). true if all the result types should be dump together, false otherwise.
|
2020-08-19 11:24:14 +02:00 |
Miriam Baglioni
|
d407852ac2
|
changed to reflect the changed in the model
|
2020-08-19 11:15:05 +02:00 |
Miriam Baglioni
|
47c21a8961
|
refactoring due to compilation
|
2020-08-19 11:11:57 +02:00 |
Miriam Baglioni
|
5570678c65
|
changed parameter name from hfdsNameNode to nameNode
|
2020-08-19 10:59:26 +02:00 |
Miriam Baglioni
|
dc5096a327
|
refactoring due to compilation
|
2020-08-19 10:57:36 +02:00 |
Miriam Baglioni
|
96600ed04a
|
modified test resource for mirroring the deletion of affiliation from author parameters
|
2020-08-14 20:41:49 +02:00 |
Miriam Baglioni
|
09f5b92763
|
added specific reference to class
|
2020-08-14 20:00:09 +02:00 |
Miriam Baglioni
|
37e7c43652
|
changed parameter name from hdfsNaemNode to nameNode
|
2020-08-14 18:18:25 +02:00 |
Miriam Baglioni
|
d2a8a4961a
|
refactoring
|
2020-08-13 18:50:33 +02:00 |
Miriam Baglioni
|
a5043de5da
|
added method to get the mapped instance
|
2020-08-13 18:45:50 +02:00 |
Miriam Baglioni
|
fcd10f452c
|
changed because of D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:55:32 +02:00 |
Miriam Baglioni
|
fd48ae3b85
|
changed because of D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:19:15 +02:00 |
Miriam Baglioni
|
04a3e1ab38
|
disabled tests
|
2020-08-13 12:18:13 +02:00 |
Miriam Baglioni
|
2ede397933
|
Apply change because of D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:16:39 +02:00 |
Miriam Baglioni
|
bfd1fcde6d
|
removed not useful method and changed because of D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:14:37 +02:00 |
Miriam Baglioni
|
7fd8397123
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:13:15 +02:00 |
Miriam Baglioni
|
753d448cc9
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:12:58 +02:00 |
Miriam Baglioni
|
c0e071fa26
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:12:40 +02:00 |
Miriam Baglioni
|
526db915bc
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:12:16 +02:00 |
Miriam Baglioni
|
b0fab0d138
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:11:57 +02:00 |
Miriam Baglioni
|
1b6320b251
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:11:41 +02:00 |
Miriam Baglioni
|
743d31be22
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:11:22 +02:00 |
Miriam Baglioni
|
65b48df652
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:11:06 +02:00 |
Miriam Baglioni
|
90b54d3efb
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:08:24 +02:00 |