Michele Artini
|
f0fbfdcfae
|
Merge branch 'stable_ids' into import_new_mdstores
|
2021-06-01 12:03:00 +02:00 |
Michele Artini
|
e950750262
|
add nodes to import hdfs mdstores
|
2021-06-01 10:48:50 +02:00 |
Michele Artini
|
03a510859a
|
removed coalesce(1)
|
2021-05-31 14:10:51 +02:00 |
Michele Artini
|
e9f2b6037c
|
patch of mdstore records
|
2021-05-31 11:36:26 +02:00 |
Michele Artini
|
ad56a44fda
|
save as gzipped sequence file
|
2021-05-28 14:45:39 +02:00 |
Claudio Atzori
|
6e3a4e9237
|
updated test expectations
|
2021-05-28 09:37:50 +02:00 |
Michele Artini
|
4fa5671d16
|
first implementation of Hdfs Mdstores Importer
|
2021-05-27 16:22:07 +02:00 |
Claudio Atzori
|
5e4b91d9ef
|
more pervasive use of constants from ModelConstants, especially for ORCID
|
2021-05-26 18:20:23 +02:00 |
Claudio Atzori
|
9d725efdc1
|
reverted implementation of the mdstore client
|
2021-05-20 18:26:09 +02:00 |
Claudio Atzori
|
ae5c28e54f
|
code formatting
|
2021-05-20 16:13:06 +02:00 |
Claudio Atzori
|
232dce83db
|
fixes #6701: xpath for titles to support both datacite and Guidelines v4 mapping
|
2021-05-20 14:41:15 +02:00 |
Claudio Atzori
|
23b8883ab1
|
applied intellij code cleanup
|
2021-05-14 10:58:12 +02:00 |
Claudio Atzori
|
d4c3476152
|
mapping datasource.journal only when an issn is available, null otherwhise
|
2021-05-11 11:08:54 +02:00 |
Claudio Atzori
|
d1cbee8413
|
imported methods from CleaningFunctions, defined in GraphCleaningFunctions
|
2021-05-10 16:43:39 +02:00 |
Claudio Atzori
|
d4a30fabe3
|
clean up tests
|
2021-05-05 17:28:15 +02:00 |
Claudio Atzori
|
dccaf173cf
|
fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials
|
2021-05-05 16:36:15 +02:00 |
Claudio Atzori
|
2e1eb96f9a
|
code formatting
|
2021-05-05 11:23:57 +02:00 |
Claudio Atzori
|
fb930b84d3
|
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
|
2021-05-04 18:06:30 +02:00 |
Claudio Atzori
|
923d19ea8e
|
mdstore read lock/unlock when bulk copying records from mongodb to hdfs
|
2021-05-04 18:06:21 +02:00 |
Sandro La Bruzzo
|
714b71bd21
|
updated pubmed
|
2021-05-04 14:54:12 +02:00 |
Alessia Bardi
|
9a20057615
|
fixed query for organisations' pids
|
2021-04-29 15:23:39 +02:00 |
Sandro La Bruzzo
|
2129e9caa7
|
updated pangaea transformation to parse directly the xml
|
2021-04-28 10:21:03 +02:00 |
Claudio Atzori
|
5afa7d3e0c
|
core utilities in dhp-common moved in external module dhp-schemas
|
2021-04-27 15:44:01 +02:00 |
Sandro La Bruzzo
|
74484d2823
|
bug fixing
|
2021-04-27 12:13:44 +02:00 |
Sandro La Bruzzo
|
c74b03d59c
|
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
|
2021-04-27 11:31:07 +02:00 |
Sandro La Bruzzo
|
7f8848ecdd
|
added first implementation of Pangaea Mapping
|
2021-04-27 11:30:37 +02:00 |
Claudio Atzori
|
27ab8a704d
|
adjusted poms to align with the external dhp-schema module
|
2021-04-27 10:12:27 +02:00 |
Claudio Atzori
|
c2bb03c8b5
|
depending on external dhp-schemas module
|
2021-04-23 17:57:35 +02:00 |
Claudio Atzori
|
c25238480c
|
making ODF record parsing namespace unaware (#6629)
|
2021-04-23 17:34:57 +02:00 |
Claudio Atzori
|
d0d477cca3
|
code formatting
|
2021-04-20 12:50:34 +02:00 |
miconis
|
0393cdce42
|
addition of alternative names in export queries
|
2021-04-20 12:45:21 +02:00 |
miconis
|
cadd0a5de8
|
modification of the queries for openorgs: they now consider also pending orgs
|
2021-04-20 12:06:56 +02:00 |
Claudio Atzori
|
d1ca025b0b
|
[cleaning] remiving authors without fullname or providing 'deactivated' keyword. Removing test test titles
|
2021-04-13 14:32:41 +02:00 |
miconis
|
11b22b2d23
|
bug fix in the query, it now exports only relations with non-hidden organizations
|
2021-04-08 11:51:47 +02:00 |
miconis
|
0857100fb8
|
implementation of the tests for the openorgs integration in the openaire provision
|
2021-04-07 18:42:16 +02:00 |
miconis
|
bf685d849f
|
addition of pids in the query for the export of openorgs for the provision, addition of ec_fields in the openorgs model
|
2021-04-07 14:27:43 +02:00 |
miconis
|
eaaefb8b4c
|
implementation of the procedure to reuse content of different dbs when creating the raw graph
|
2021-04-06 14:35:51 +02:00 |
miconis
|
c39c82dfe9
|
modification of the jobs for the integration of openorgs in the provision, dedup records are no more created by merging but simply taking results of openorgs portal
|
2021-04-06 14:31:00 +02:00 |
Claudio Atzori
|
7941d7be29
|
WIP: using common definitions from ModelConstants
|
2021-03-31 18:33:57 +02:00 |
Claudio Atzori
|
72ce741ea6
|
WIP: using common definitions from ModelConstants
|
2021-03-31 17:07:13 +02:00 |
Claudio Atzori
|
9237d55d7f
|
[OpenOrgsWf] cleanup
|
2021-03-29 17:40:34 +02:00 |
Claudio Atzori
|
7f4e9479ec
|
[OpenOrgsWf] graph construction wf: allow to skip the import openorgs node (importOpenorgs true|false)
|
2021-03-29 16:59:16 +02:00 |
miconis
|
2709d08fc2
|
Merge branch 'stable_ids' into openorgswf
|
2021-03-29 16:39:07 +02:00 |
miconis
|
f446580e9f
|
code refactoring (useless classes and wf removed), implementation of the test for the openorgs dedup
|
2021-03-29 16:10:46 +02:00 |
miconis
|
2355cc4e9b
|
minor changes and bug fix
|
2021-03-29 10:07:12 +02:00 |
Claudio Atzori
|
827e7e37db
|
[Cleaning] drop instance.alternateIdentifier elements when they are available among instance.pid
|
2021-03-25 11:07:59 +01:00 |
miconis
|
28c1cdd132
|
merged stable_ids into openorgswf
|
2021-03-25 10:44:49 +01:00 |
miconis
|
348b0ef921
|
bug fix, implementation of the workflow for the creation of raw_organizations (openorgs dedup), addition of the pid lists to the openorgs postgres db
|
2021-03-24 15:51:27 +01:00 |
Claudio Atzori
|
751125fdf9
|
[Actionmanager] zero function considers empty entity.id as well as rel.source/rel.target
|
2021-03-23 17:34:32 +01:00 |
Claudio Atzori
|
b4febed138
|
updated mapping tests as consequence of the special treatment reserved to Handle PIDs
|
2021-03-23 09:37:48 +01:00 |
Claudio Atzori
|
431cbe9955
|
handle missing instance.pid during bulk cleaning
|
2021-03-23 09:28:58 +01:00 |
Sandro La Bruzzo
|
c73072079d
|
fix conflicts
|
2021-03-22 16:36:31 +01:00 |
Claudio Atzori
|
5a043e95ea
|
code formatting
|
2021-03-19 11:37:27 +01:00 |
Claudio Atzori
|
a4e82a65aa
|
integrated filter applied when merging BETA & PROD graphs to rule our records from Datacite
|
2021-03-19 11:34:44 +01:00 |
Claudio Atzori
|
8257f9a2bc
|
result.pid: adjusted the mapping applied to the contents from the aggregator
|
2021-03-17 12:45:38 +01:00 |
Claudio Atzori
|
640b885706
|
added instance.alternativeIdentifiers to the graph model, adjusted the mapping applied to the contents from the aggregator
|
2021-03-16 14:19:32 +01:00 |
Claudio Atzori
|
01630f638d
|
IdentifierFactory implementation based on the list of datasources authoritative for a given pid type
|
2021-03-09 17:11:50 +01:00 |
Claudio Atzori
|
59532b0919
|
[#6281 Provenance of product PIDs] Added PIDs to the Instance type; extended mapping for OAF/ODF records
|
2021-03-09 11:14:45 +01:00 |
Claudio Atzori
|
d525785497
|
[#6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color.
|
2021-03-09 11:12:55 +01:00 |
Claudio Atzori
|
f468c7f0d7
|
merged from master
|
2021-03-09 09:12:41 +01:00 |
Claudio Atzori
|
8d2bb24512
|
merged from master
|
2021-03-08 15:44:34 +01:00 |
Claudio Atzori
|
fa7930d2e2
|
merging contributions from PR#97
|
2021-03-05 15:45:28 +01:00 |
miconis
|
1a85020572
|
bug fix in graph-mapper, changes in the implementation of the openorgs wf to create relations and populate openorgs db
|
2021-02-26 10:19:28 +01:00 |
Claudio Atzori
|
b830e33392
|
mdstore collector plugin
|
2021-02-25 12:30:30 +01:00 |
Claudio Atzori
|
fc3fa5e343
|
implemented mdstore collector plugin
|
2021-02-24 15:07:24 +01:00 |
miconis
|
4b2124a18e
|
implementation of the openorgs wfs, implementation of the raw_all wf to migrate openorgs db entities
|
2021-02-10 11:51:50 +01:00 |
Alessia Bardi
|
c4d1feca74
|
mapper test with validated link to project
|
2021-02-10 11:22:54 +01:00 |
Claudio Atzori
|
72c57b28fa
|
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
|
2021-02-04 14:08:18 +01:00 |
Alessia Bardi
|
c67329d3ad
|
updated test for EU Open Data portal datasets
|
2021-02-03 17:06:48 +01:00 |
Alessia Bardi
|
fd705404a1
|
tests for EU Open Data portal dataset mapping
|
2021-02-03 10:28:17 +01:00 |
Sandro La Bruzzo
|
686e7b507c
|
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into aggregation_on_hadoop
|
2021-01-28 10:02:13 +01:00 |
Sandro La Bruzzo
|
98b9498b57
|
Removed old messaging system not quite used from collection and Transformation workflow
code refactor
|
2021-01-28 09:51:17 +01:00 |
Sandro La Bruzzo
|
150a617bd1
|
Merge pull request 'aggregation_on_hadoop' (#90) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
Wonderfull code... You're the Best Sandro
|
2021-01-26 16:00:47 +01:00 |
Claudio Atzori
|
885e0dd926
|
[Cleaning] filter authors not providing word characters in the fullname
|
2021-01-26 09:48:53 +01:00 |
Claudio Atzori
|
2890511613
|
[Cleaning] normalise missing Result.country
|
2021-01-26 09:41:44 +01:00 |
Claudio Atzori
|
4eb9ed35b1
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2021-01-25 18:12:24 +01:00 |
Claudio Atzori
|
cd379eb5e3
|
[Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname
|
2021-01-25 18:11:49 +01:00 |
Alessia Bardi
|
505477f36f
|
format code
|
2021-01-25 18:02:49 +01:00 |
Alessia Bardi
|
ded6ed8d7d
|
no ',' author, if there are no author in ODF records
|
2021-01-25 17:57:51 +01:00 |
Claudio Atzori
|
3465c8ccee
|
[Cleaning] trying to avoid NPEs
|
2021-01-25 16:54:53 +01:00 |
Sandro La Bruzzo
|
a54848a59c
|
Moved Vocabulary stuff to common module
|
2021-01-25 15:43:04 +01:00 |
Claudio Atzori
|
07a0ccfc96
|
[Cleaning] trying to avoid NPEs
|
2021-01-25 13:36:01 +01:00 |
Claudio Atzori
|
34d653de41
|
[Cleaning] updated cleaning rule for DOIs
|
2021-01-22 14:16:33 +01:00 |
Claudio Atzori
|
26e9d55c13
|
code formatting
|
2021-01-05 09:59:26 +01:00 |
Claudio Atzori
|
7185158942
|
ignore missing properties
|
2020-12-29 11:06:28 +01:00 |
Claudio Atzori
|
28460c2cd1
|
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
|
2020-12-23 16:59:52 +01:00 |
Claudio Atzori
|
723b01f9e9
|
trivial: the less magic numbers and values around, the better
|
2020-12-23 12:22:48 +01:00 |
Claudio Atzori
|
6cb0dc3f43
|
extended OCRID cleaning procedure
|
2020-12-21 11:40:17 +01:00 |
Claudio Atzori
|
47270d9af5
|
lenient mock can be lenient
|
2020-12-18 15:38:59 +01:00 |
Alessia Bardi
|
f9a8fd8bbd
|
updated test record for textgrid
|
2020-12-17 11:59:45 +01:00 |
Michele Artini
|
991e675dc6
|
validation in claim rels
|
2020-12-14 15:41:25 +01:00 |
Claudio Atzori
|
12e2f930c8
|
resolved conflicts
|
2020-12-10 10:57:39 +01:00 |
Alessia Bardi
|
112da6d76a
|
in theory, just auto-formatting after mvn compile
|
2020-12-09 20:00:27 +01:00 |
Alessia Bardi
|
bece04b330
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-12-09 19:54:43 +01:00 |
Alessia Bardi
|
426b76ee8e
|
more asserts for TextGrid record
|
2020-12-09 19:46:11 +01:00 |
Claudio Atzori
|
4705144918
|
Merge pull request 'rel_project_validation' (#69) from rel_project_validation into master
LGTM
|
2020-12-09 19:01:20 +01:00 |
Claudio Atzori
|
ada21ad920
|
Merge pull request 'dump of the results related to at least one project' (#61) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
|
2020-12-09 17:22:56 +01:00 |
Michele Artini
|
1bc9adc10d
|
default trust for validated rels
|
2020-12-09 16:18:37 +01:00 |
Michele Artini
|
5f21a356fd
|
reindent
|
2020-12-09 11:24:30 +01:00 |
Michele Artini
|
370a5e650b
|
validation attributes in resultProject relations
|
2020-12-09 11:18:26 +01:00 |
Claudio Atzori
|
a104a632df
|
cleanup
|
2020-12-04 16:32:47 +01:00 |
Miriam Baglioni
|
5fb65ffc4a
|
merge branch with master
|
2020-12-03 11:24:35 +01:00 |
Miriam Baglioni
|
ea88dc3401
|
fixed issue in property name
|
2020-12-03 11:24:23 +01:00 |
Claudio Atzori
|
cfb55effd9
|
code formatting
|
2020-12-02 11:23:49 +01:00 |
Claudio Atzori
|
57f448b7a4
|
graph cleaning workflow separate orcid_pending from orcid, depending on the author pid provenance
|
2020-12-02 10:44:05 +01:00 |
Alessia Bardi
|
a417624670
|
tests for raw graph mapping
|
2020-12-02 10:15:26 +01:00 |
Claudio Atzori
|
893ac4a77b
|
GenerateEntitiesApplication can be configured to hash the id value or not
|
2020-12-02 09:30:06 +01:00 |
Claudio Atzori
|
2c407e775e
|
GenerateEntitiesApplication can be configured to hash the id value or not
|
2020-11-30 12:00:38 +01:00 |
Claudio Atzori
|
e731a7658d
|
cleaning texts to remove tab characters too
|
2020-11-27 09:00:04 +01:00 |
Claudio Atzori
|
c1b9a4045a
|
grouping of records will be performed by the dedup workflow
|
2020-11-26 10:59:10 +01:00 |
Miriam Baglioni
|
124591a7f3
|
refactoring
|
2020-11-25 18:23:28 +01:00 |
Miriam Baglioni
|
1a89f8211c
|
D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:12:40 +01:00 |
Miriam Baglioni
|
5fbe54ef54
|
D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:10:28 +01:00 |
Miriam Baglioni
|
ed01e5a5e1
|
D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:09:34 +01:00 |
Miriam Baglioni
|
d4ddde2ef2
|
changed because of D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:01:01 +01:00 |
Miriam Baglioni
|
f5e5e92a10
|
changed because of D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 17:58:53 +01:00 |
Miriam Baglioni
|
1df94b85b4
|
changed because of D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 17:57:43 +01:00 |
Claudio Atzori
|
dfd6205b95
|
Consistency graph workflow merges all the entities by ID
|
2020-11-25 14:55:32 +01:00 |
Miriam Baglioni
|
90d4369fd2
|
added test to verify the compression in writing community info on hdfs
|
2020-11-25 14:34:58 +01:00 |
Miriam Baglioni
|
6750e33d69
|
merge branch with master
|
2020-11-25 14:09:01 +01:00 |
Miriam Baglioni
|
b2c455f883
|
added java doc
|
2020-11-25 14:08:09 +01:00 |
Miriam Baglioni
|
1f130cdf92
|
changed the relation (produces -> isProducedBy) due to the change in the code
|
2020-11-25 14:04:26 +01:00 |
Miriam Baglioni
|
e758d5d9b4
|
refactoring
|
2020-11-25 13:46:39 +01:00 |
Miriam Baglioni
|
87a9f616ae
|
refactoring and addition of the funder nsp first part as nome for the dump insteasd of the whole nsp
|
2020-11-25 13:45:41 +01:00 |
Miriam Baglioni
|
e7e418e444
|
added decision node to verify if to upload in Zenodo
|
2020-11-25 13:44:10 +01:00 |
Miriam Baglioni
|
305e3d0c9c
|
added resource file for relation with relClass = isProducedBy
|
2020-11-25 13:43:41 +01:00 |
Miriam Baglioni
|
21ce175d17
|
added FilterFunction specification if filter operation
|
2020-11-25 13:42:31 +01:00 |
Miriam Baglioni
|
bde6d337dd
|
test classes for dump of results related to funders
|
2020-11-25 13:42:01 +01:00 |
Miriam Baglioni
|
b37b9352d7
|
added constant value for semantic relationship between projects and results
|
2020-11-25 13:41:08 +01:00 |
Claudio Atzori
|
36173c13a5
|
reverted filters in the clening process
|
2020-11-25 10:24:42 +01:00 |
Claudio Atzori
|
eeebd5a920
|
Cleanig workflow: remove newlines from titles, descriptions, subjects
|
2020-11-24 18:40:25 +01:00 |
Claudio Atzori
|
e1a1bb3ee4
|
moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects
|
2020-11-24 18:34:03 +01:00 |
Miriam Baglioni
|
72bb0fe360
|
changed directory name
|
2020-11-24 16:47:07 +01:00 |
Miriam Baglioni
|
39f4a20873
|
chenged the path and the name for saving the communities_infrastructures dump file
|
2020-11-24 14:47:32 +01:00 |
Miriam Baglioni
|
7e14452a87
|
final versione of the wf to get the dump of results associated to at least one funder per funder
|
2020-11-24 14:46:34 +01:00 |
Miriam Baglioni
|
c167a18057
|
added new parameter for the dumpType
|
2020-11-24 14:45:50 +01:00 |
Miriam Baglioni
|
54a309bb6b
|
refactoring
|
2020-11-24 14:45:30 +01:00 |
Miriam Baglioni
|
35ecea8842
|
changed to consider the modification for the specification of the type of dump
|
2020-11-24 14:45:15 +01:00 |
Miriam Baglioni
|
b9b6bdb2e6
|
fixing issue on previous implementation
|
2020-11-24 14:44:53 +01:00 |
Miriam Baglioni
|
7e940f1991
|
changed to consider the modification for the specification of the type of dump
|
2020-11-24 14:43:34 +01:00 |
Miriam Baglioni
|
62928ef7a5
|
changed to save the communities_infrastructures information as the other entity dumps: in a json.gz file
|
2020-11-24 14:42:41 +01:00 |
Claudio Atzori
|
33bae02451
|
reverted behaviour of the cleaning workflow: grouping entities by ID will be managed differently
|
2020-11-24 14:42:33 +01:00 |
Miriam Baglioni
|
3319440c53
|
changed the direction of the relation between projects and result considered to select the results linked to projects
|
2020-11-24 14:41:09 +01:00 |
Miriam Baglioni
|
00c377dac2
|
added specification of MapFunction types in map
|
2020-11-24 14:40:22 +01:00 |
Miriam Baglioni
|
44db258dc4
|
added enumerated for the dump type
|
2020-11-24 14:38:06 +01:00 |
Miriam Baglioni
|
1832708c42
|
modified boolean variable with string one whcih specify the type of dump we are performing: complete, community or funder
|
2020-11-24 14:37:36 +01:00 |
Miriam Baglioni
|
259c67ce36
|
fixed issue in path name
|
2020-11-20 12:32:23 +01:00 |
Miriam Baglioni
|
0a9db67eec
|
-
|
2020-11-20 12:21:33 +01:00 |
Miriam Baglioni
|
d362f2637d
|
merge branch with master
|
2020-11-19 19:17:20 +01:00 |
Miriam Baglioni
|
cf3f47563f
|
new parameter files
|
2020-11-19 19:16:05 +01:00 |
Miriam Baglioni
|
24c56fa7a3
|
new logic and workflow for dump of results with link to projects. In this implementation the result match the model of the communityresult.
|
2020-11-19 19:15:39 +01:00 |
Claudio Atzori
|
fcbb05eb21
|
cleanup
|
2020-11-19 15:14:33 +01:00 |
Claudio Atzori
|
3f34757c63
|
merged from master
|
2020-11-19 14:34:54 +01:00 |
Miriam Baglioni
|
fafb688887
|
-
|
2020-11-18 18:56:48 +01:00 |
Miriam Baglioni
|
906db690d2
|
-
|
2020-11-18 17:43:08 +01:00 |
Claudio Atzori
|
ede7fae6c8
|
Merge pull request 'XML record indexing test' (#58) from provision_indexing into master
|
2020-11-18 17:04:34 +01:00 |
Miriam Baglioni
|
5402062ff5
|
changed parameter file with the ono associated to the job
|
2020-11-18 16:58:20 +01:00 |
Miriam Baglioni
|
a172a37ad1
|
fixed typo
|
2020-11-18 16:55:07 +01:00 |
Miriam Baglioni
|
46ba3793f6
|
code, workflow and parameters for the dump of the results associated to funders
|
2020-11-18 16:47:31 +01:00 |
Miriam Baglioni
|
57cac36898
|
changed the workflow name
|
2020-11-18 13:38:03 +01:00 |
Claudio Atzori
|
8177ce7939
|
test for XmlIndexingJob based on a local miniSolrCluster
|
2020-11-18 10:58:05 +01:00 |
Alessia Bardi
|
10e673660f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-18 10:01:23 +01:00 |
Alessia Bardi
|
be7b310cef
|
rel semantcis ignore case
|
2020-11-18 10:01:20 +01:00 |
Michele Artini
|
33da2e3d6c
|
xpaths for dateOfCollection and dateOfTransformation
|
2020-11-18 09:26:20 +01:00 |
Alessia Bardi
|
8f87020a50
|
#56: map relevantDates from aggregated ODF records
|
2020-11-17 18:42:09 +01:00 |
Alessia Bardi
|
7e0a76a8ac
|
test fr TextGrid
|
2020-11-17 18:39:25 +01:00 |
Claudio Atzori
|
cfc01f136e
|
PID filtering based on a blacklist
|
2020-11-17 12:27:06 +01:00 |
Claudio Atzori
|
6ab1ce53c9
|
fixed condition in result pid cleaning; cleanup
|
2020-11-16 10:09:17 +01:00 |
Claudio Atzori
|
4de8c8b237
|
fixed workflow variable name
|
2020-11-16 10:03:11 +01:00 |
Claudio Atzori
|
331d621800
|
added test resource
|
2020-11-14 12:16:15 +01:00 |
Claudio Atzori
|
5d4e34e26a
|
fixed typo in variable name
|
2020-11-14 10:32:26 +01:00 |
Claudio Atzori
|
768bc5304c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-13 15:40:34 +01:00 |
Claudio Atzori
|
93f7b7974f
|
Merge pull request 'trust truncated to 3 decimals' (#24) from trunc_trust into master
LGTM
|
2020-11-13 15:40:02 +01:00 |
Claudio Atzori
|
528231a287
|
grouping graph entities by id turned out to be an easy extension for the already existing cleaning workflow
|
2020-11-13 15:37:48 +01:00 |
Claudio Atzori
|
2bed29eb09
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:12 +01:00 |
Claudio Atzori
|
13e36a4da0
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:02 +01:00 |
Claudio Atzori
|
9b0fb9e958
|
merged from master
|
2020-11-12 09:27:12 +01:00 |
Michele Artini
|
40160d171f
|
organizations pids
|
2020-11-09 12:58:36 +01:00 |
Sandro La Bruzzo
|
027ef2326c
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-11-06 17:12:42 +01:00 |
Sandro La Bruzzo
|
cd27df91a1
|
fixed bug on missing relation in ANDS
|
2020-11-06 17:12:31 +01:00 |
Claudio Atzori
|
d10447e747
|
re-packaged graph dump workflow sources
|
2020-11-05 17:38:18 +01:00 |
Claudio Atzori
|
2d76497488
|
cleanup
|
2020-11-05 17:10:24 +01:00 |
Miriam Baglioni
|
f8e9bda24c
|
merge branch with master
|
2020-11-05 16:31:18 +01:00 |
Miriam Baglioni
|
be5ed8f554
|
added check to avoid sending empty metadata.
|
2020-11-05 16:10:17 +01:00 |
Claudio Atzori
|
2148a51fae
|
minor changes
|
2020-11-05 11:24:12 +01:00 |
Claudio Atzori
|
4625b7486e
|
code formatting
|
2020-11-04 18:12:43 +01:00 |
Miriam Baglioni
|
e9ac471ae9
|
removed dependency from classes for the pid graph dump
|
2020-11-04 18:04:42 +01:00 |
Miriam Baglioni
|
b90a945c49
|
removed property files for pid graph dump
|
2020-11-04 17:28:33 +01:00 |
Miriam Baglioni
|
bac307155a
|
removed properties specific for pid graph dump
|
2020-11-04 17:28:04 +01:00 |
Miriam Baglioni
|
9c9d50f486
|
removed code specific for pid graph dump
|
2020-11-04 17:26:22 +01:00 |
Miriam Baglioni
|
5669890934
|
removed commented lines
|
2020-11-04 17:15:21 +01:00 |
Miriam Baglioni
|
6a89f59be9
|
removed commented lines
|
2020-11-04 17:13:59 +01:00 |
Miriam Baglioni
|
56150d7e5e
|
removed all code related to the dump of pids graph
|
2020-11-04 17:13:12 +01:00 |
Miriam Baglioni
|
16c54a96f8
|
removed pid dump
|
2020-11-04 17:11:32 +01:00 |
Miriam Baglioni
|
0cac5436ff
|
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
|
2020-11-04 13:21:11 +01:00 |
Alessia Bardi
|
51808b5afd
|
Updated descriptions
|
2020-11-04 12:29:48 +01:00 |
Alessia Bardi
|
e6becf8659
|
Updated descriptions
|
2020-11-04 12:17:57 +01:00 |
Alessia Bardi
|
0abe0eee33
|
Updated descriptions
|
2020-11-04 12:15:30 +01:00 |
Alessia Bardi
|
f6ab238f5d
|
Updated descriptions
|
2020-11-04 11:50:47 +01:00 |
Miriam Baglioni
|
c010a8442f
|
fixed issue on test code
|
2020-11-03 17:26:51 +01:00 |
Miriam Baglioni
|
8ec7a61188
|
merge branch with master
|
2020-11-03 16:59:08 +01:00 |
Miriam Baglioni
|
c209284ca7
|
new schemas for the entities in the dump with added descriptions
|
2020-11-03 16:58:08 +01:00 |
Miriam Baglioni
|
08806deddf
|
added the splitSize non mandatory parameter. Default size 10G
|
2020-11-03 16:57:34 +01:00 |
Miriam Baglioni
|
7d2eda43ca
|
added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase
|
2020-11-03 16:57:01 +01:00 |
Miriam Baglioni
|
cbbb1bdc54
|
moved business logic to new class in common for handling the zip of hte archives
|
2020-11-03 16:55:50 +01:00 |
Miriam Baglioni
|
d4382b54df
|
moved the tar archive with maz size on common module
|
2020-11-03 16:54:50 +01:00 |
Claudio Atzori
|
86d6fbe95b
|
refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong
|
2020-11-03 12:19:46 +01:00 |
Claudio Atzori
|
8471888ad3
|
Merge branch 'graph_cleaning' into stable_ids
|
2020-11-03 11:52:47 +01:00 |
Claudio Atzori
|
5310e56dba
|
remove empy PIDs
|
2020-11-03 11:52:10 +01:00 |
Claudio Atzori
|
3fcd669e99
|
result merge operation leverage on custom ResultTypeComparator in the aggregator graph construction
|
2020-11-03 10:53:23 +01:00 |
Claudio Atzori
|
09e44dabff
|
Merge branch 'master' into stable_ids
|
2020-11-02 12:16:01 +01:00 |
Sandro La Bruzzo
|
754c86f33e
|
fixed test to work on jenkins
|
2020-11-02 09:35:01 +01:00 |
Miriam Baglioni
|
dabb33e018
|
changed the discriminant for which split the file
|
2020-10-30 17:52:22 +01:00 |
Miriam Baglioni
|
0fba08eae4
|
max allowed size per file 10 Gb
|
2020-10-30 16:05:55 +01:00 |
Claudio Atzori
|
4ca75d6951
|
Merge pull request 'Dedup ID creation policy' (#48) from deduptesting into stable_ids
|
2020-10-30 15:15:32 +01:00 |
Miriam Baglioni
|
b828587252
|
prevent the code to cicle indefinetly
|
2020-10-30 15:01:25 +01:00 |
Miriam Baglioni
|
f747e303ac
|
classes for dumping of the graph as ttl file
|
2020-10-30 14:13:45 +01:00 |
Miriam Baglioni
|
16baf5b69e
|
formatting
|
2020-10-30 14:13:14 +01:00 |
Miriam Baglioni
|
a9eef9c852
|
added check for possible Optional value in relation dataInfo
|
2020-10-30 14:12:28 +01:00 |
Miriam Baglioni
|
5f4de9a962
|
formatting
|
2020-10-30 14:11:40 +01:00 |
Miriam Baglioni
|
14bf2e7238
|
added option to split dumps bigger that 40Gb on different files
|
2020-10-30 14:09:04 +01:00 |
Claudio Atzori
|
58f28296ea
|
ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values
|
2020-10-30 10:56:42 +01:00 |
Miriam Baglioni
|
78fdb11c3f
|
merge branch with master
|
2020-10-29 12:55:22 +01:00 |
Sandro La Bruzzo
|
1d9fdb7367
|
fixed spark memory issue in SparkSplitOafTODLIEntities
|
2020-10-28 12:30:32 +01:00 |
Miriam Baglioni
|
d2374e3b9e
|
added code to handle cases where the funding tree is not existing
|
2020-10-27 16:15:21 +01:00 |
Miriam Baglioni
|
5d3012eeb4
|
changed code to dump only the programme list and not the classification list
|
2020-10-27 16:14:18 +01:00 |
Miriam Baglioni
|
3241ec1777
|
added connection timeout and socket timeout 600 sec
|
2020-10-27 16:12:11 +01:00 |
Alessia Bardi
|
1425d810a8
|
testing mapping
|
2020-10-19 17:46:14 +02:00 |
Claudio Atzori
|
266bf1a221
|
common IdentifierFactory in use on the mapping from the aggregator data; merge the entities sharing the same id; code formatting
|
2020-10-16 17:02:10 +02:00 |
Claudio Atzori
|
34f1d0904b
|
common IdentifierFactory in use on the mapping from the aggregator data
|
2020-10-16 16:00:19 +02:00 |
Sandro La Bruzzo
|
fed711da80
|
Merge remote-tracking branch 'origin/master' into merge_record_to_common
|
2020-10-13 15:32:45 +02:00 |
Alessia Bardi
|
8775a64bc1
|
Merge pull request 'Merging different compatibility levels (pinocchio operator)' (#47) from merge_graph into master
|
2020-10-09 14:44:52 +02:00 |
Claudio Atzori
|
e751c1402f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-10-09 13:53:21 +02:00 |
Claudio Atzori
|
b961dc7d1e
|
added originalid to the fields in the result graph view
|
2020-10-09 13:53:15 +02:00 |
Sandro La Bruzzo
|
eec418cd26
|
moved AuthoreMerger into dhp-common
|
2020-10-08 10:33:55 +02:00 |
Sandro La Bruzzo
|
fe0a7870e6
|
Added test to check if merge authors works
|
2020-10-08 10:33:12 +02:00 |
Sandro La Bruzzo
|
cd9c377d18
|
adpted scholexplorer Dump generation to the new Dataset definition
|
2020-10-08 10:10:13 +02:00 |
Claudio Atzori
|
a3f37a9414
|
javadoc
|
2020-10-07 16:44:22 +02:00 |
Claudio Atzori
|
8d85a2fced
|
[BETA wf only] datasources involved in the merge operation doesn't obey to the infra precedence policy, but relies on a custom behaviour that, given two datasources from beta and prod returns the one from prod with the highest compatibility among the two
|
2020-10-07 16:28:52 +02:00 |
Miriam Baglioni
|
ae08b3c0dd
|
merge branch with master
|
2020-10-05 11:35:55 +02:00 |
Miriam Baglioni
|
11b7eaae09
|
changed the name of the folder where to store the context entity from context to communities_infrastructures
|
2020-10-05 11:24:54 +02:00 |
Miriam Baglioni
|
32bffb0134
|
changed the name from communities_infrastructures to communities_infrastuctures.json
|
2020-10-05 11:24:17 +02:00 |
Miriam Baglioni
|
25cbcf6114
|
changed to solve issues about names. context renamed communities_infrastructure.json and removed the double json.gz extention to the name of the part in the tar
|
2020-10-02 12:17:46 +02:00 |
Claudio Atzori
|
49ae3450a9
|
code formatting
|
2020-10-02 09:43:24 +02:00 |
Claudio Atzori
|
c2a6e2a9bf
|
fixed mapping for datasource journal info (ISSNs)
|
2020-10-02 09:37:08 +02:00 |
Miriam Baglioni
|
01117a46e1
|
whole workflow activated
|
2020-10-01 17:19:21 +02:00 |
Miriam Baglioni
|
cfb5766c6b
|
removed double json.gz from names of files in the tar
|
2020-10-01 17:18:34 +02:00 |
Miriam Baglioni
|
fcaedac980
|
merge branch with master
|
2020-10-01 16:46:59 +02:00 |
Miriam Baglioni
|
c6e6ed1bd8
|
merge branch with master
|
2020-10-01 16:24:41 +02:00 |
Claudio Atzori
|
2e9e13444d
|
author pids made unique by value
|
2020-10-01 12:50:40 +02:00 |
Claudio Atzori
|
e265c3e125
|
cleaning functions factored out in a dedicated class
|
2020-10-01 10:50:15 +02:00 |
Claudio Atzori
|
4287164aba
|
include relevantdate field in the result view
|
2020-10-01 10:28:55 +02:00 |
Miriam Baglioni
|
7b6a7333e6
|
merge branch with master
|
2020-09-25 16:42:07 +02:00 |
Miriam Baglioni
|
983a12ed15
|
temporary modification to allow the upload of files in the sandbox without the neew to recreate the mapping from scratch
|
2020-09-25 16:41:51 +02:00 |
Miriam Baglioni
|
8b36d19182
|
added property depositionId and chenage property newVersion that became string from boolean to handle the three possible distinct values
|
2020-09-25 16:41:15 +02:00 |
Miriam Baglioni
|
ed5239f9ec
|
added new code to handle the new possibility to upload files to an already open deposition
|
2020-09-25 16:34:32 +02:00 |
Miriam Baglioni
|
3a8c524fce
|
refactor
|
2020-09-25 16:34:02 +02:00 |
Miriam Baglioni
|
54800fb9b0
|
enabled only the step to upload in zenodo
|
2020-09-25 14:40:22 +02:00 |
Miriam Baglioni
|
de6c4d46d8
|
fixed conflicts
|
2020-09-24 15:35:01 +02:00 |
Claudio Atzori
|
044d3a0214
|
fixed query used to load datasources in the Graph
|
2020-09-24 13:48:58 +02:00 |
Claudio Atzori
|
27df1cea6d
|
code formatting
|
2020-09-24 12:16:00 +02:00 |
Claudio Atzori
|
fb22f4d70b
|
included values for projects fundedamount and totalcost fields in the mapping tests. Swapped expected and actual values in junit test assertions
|
2020-09-24 12:10:59 +02:00 |
Claudio Atzori
|
42f55395c8
|
fixed order of the ISSNs returned by the SQL query
|
2020-09-24 12:09:58 +02:00 |
Claudio Atzori
|
9a7e72d528
|
using concat_ws to join textual columns from PSQL. When using || to perform the concatenation, Null columns makes the operation result to be Null
|
2020-09-24 10:42:47 +02:00 |
Claudio Atzori
|
9e3e93c6b6
|
setting the correct issn type in the datasource.journal element
|
2020-09-24 10:39:16 +02:00 |
Miriam Baglioni
|
39eb8ab25b
|
changed the dump to move from h2020programme to h2020classification
|
2020-09-23 17:33:00 +02:00 |
Miriam Baglioni
|
c2b5c780ff
|
-
|
2020-09-14 14:34:03 +02:00 |
Miriam Baglioni
|
e2ceefe9be
|
-
|
2020-09-14 14:33:28 +02:00 |
Miriam Baglioni
|
1f893e63dc
|
-
|
2020-09-14 14:33:10 +02:00 |
Claudio Atzori
|
8a523474b7
|
code formatting
|
2020-09-07 11:40:16 +02:00 |
Miriam Baglioni
|
b72a7dad46
|
resuorce for pid graph dump
|
2020-08-24 17:09:01 +02:00 |
Miriam Baglioni
|
8694bb9b31
|
refactoring due to compilation
|
2020-08-24 17:07:34 +02:00 |
Miriam Baglioni
|
8a069a4fea
|
-
|
2020-08-24 17:01:30 +02:00 |
Miriam Baglioni
|
34fa96f3b1
|
-
|
2020-08-24 17:00:20 +02:00 |
Miriam Baglioni
|
5fb2949cb8
|
added utils methods
|
2020-08-24 17:00:09 +02:00 |
Miriam Baglioni
|
2a540b6c01
|
added constants for the pid graph dump
|
2020-08-24 16:55:35 +02:00 |
Miriam Baglioni
|
da103c399a
|
resources for the pid graph dump test
|
2020-08-24 16:52:07 +02:00 |
Miriam Baglioni
|
630a6a1fe7
|
first tests for the pid graph dump
|
2020-08-24 16:51:26 +02:00 |
Miriam Baglioni
|
40c8d2de7b
|
test resources for the dump of the pids graph
|
2020-08-24 16:50:39 +02:00 |
Miriam Baglioni
|
bef79d3bdf
|
first attempt to the dump of pids graph
|
2020-08-24 16:49:38 +02:00 |
Miriam Baglioni
|
85203c16e3
|
merge branch with master
|
2020-08-19 11:49:03 +02:00 |
Miriam Baglioni
|
2c783793ba
|
removed the affiliation from the author to mirror the changes in the model
|
2020-08-19 11:48:12 +02:00 |
Miriam Baglioni
|
f6bf888016
|
removed affiliation from author to mirror the changes in the model
|
2020-08-19 11:41:41 +02:00 |
Miriam Baglioni
|
66d0e0d3f2
|
-
|
2020-08-19 11:31:50 +02:00 |
Miriam Baglioni
|
1c593a9cfe
|
-
|
2020-08-19 11:29:51 +02:00 |
Miriam Baglioni
|
e42b2f5ae2
|
-
|
2020-08-19 11:29:09 +02:00 |
Miriam Baglioni
|
f81ee22418
|
changed to mirror the changes in the model (Instance, CommunityInstance, GraphResult)
|
2020-08-19 11:28:26 +02:00 |
Miriam Baglioni
|
387be43fd4
|
changed to discriminate if dumping all the results type together or each one in its own archive
|
2020-08-19 11:25:27 +02:00 |
Miriam Baglioni
|
c5858afb88
|
added parameter to guide the dump for the result (resultAggregation). true if all the result types should be dump together, false otherwise.
|
2020-08-19 11:24:14 +02:00 |
Miriam Baglioni
|
d407852ac2
|
changed to reflect the changed in the model
|
2020-08-19 11:15:05 +02:00 |
Miriam Baglioni
|
47c21a8961
|
refactoring due to compilation
|
2020-08-19 11:11:57 +02:00 |
Miriam Baglioni
|
5570678c65
|
changed parameter name from hfdsNameNode to nameNode
|
2020-08-19 10:59:26 +02:00 |
Miriam Baglioni
|
dc5096a327
|
refactoring due to compilation
|
2020-08-19 10:57:36 +02:00 |
Miriam Baglioni
|
96600ed04a
|
modified test resource for mirroring the deletion of affiliation from author parameters
|
2020-08-14 20:41:49 +02:00 |
Miriam Baglioni
|
09f5b92763
|
added specific reference to class
|
2020-08-14 20:00:09 +02:00 |
Miriam Baglioni
|
37e7c43652
|
changed parameter name from hdfsNaemNode to nameNode
|
2020-08-14 18:18:25 +02:00 |
Miriam Baglioni
|
d2a8a4961a
|
refactoring
|
2020-08-13 18:50:33 +02:00 |
Miriam Baglioni
|
a5043de5da
|
added method to get the mapped instance
|
2020-08-13 18:45:50 +02:00 |
Miriam Baglioni
|
fcd10f452c
|
changed because of D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:55:32 +02:00 |
Miriam Baglioni
|
fd48ae3b85
|
changed because of D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:19:15 +02:00 |