Sandro La Bruzzo
cc0f2b11fb
Implemented mapping from pubmed baseline to OAF
2021-06-16 14:56:24 +02:00
Michele Artini
ada063ce70
fixed a problem with empty mdstore list (2)
2021-06-14 12:04:47 +02:00
Michele Artini
83132ee99a
fixed a problem with empty mdstore list
2021-06-14 11:57:00 +02:00
Sandro La Bruzzo
aeb8132627
Merged branch stable_ids
2021-06-14 10:07:29 +02:00
Claudio Atzori
2039bb9f5f
orcid / orcid_pending cleaning backported from master branch
2021-06-14 09:40:50 +02:00
Claudio Atzori
dd19c4ac5a
Merge pull request 'import_new_mdstores' ( #112 ) from import_new_mdstores into stable_ids
...
Reviewed-on: D-Net/dnet-hadoop#112
2021-06-14 09:23:55 +02:00
Claudio Atzori
a900bfb874
delegating the date parsing to https://github.com/sisyphsu/dateparser
2021-06-11 16:53:01 +02:00
Sandro La Bruzzo
5b724d9972
added relations to datacite mapping
2021-06-04 10:14:22 +02:00
Sandro La Bruzzo
e57294ac99
implemented changes on PUBMed dataflow
2021-06-03 10:52:09 +02:00
Michele Artini
ede2749822
orcid pid type
2021-06-01 12:42:43 +02:00
Michele Artini
f0fbfdcfae
Merge branch 'stable_ids' into import_new_mdstores
2021-06-01 12:03:00 +02:00
Michele Artini
e950750262
add nodes to import hdfs mdstores
2021-06-01 10:48:50 +02:00
Michele Artini
03a510859a
removed coalesce(1)
2021-05-31 14:10:51 +02:00
Michele Artini
e9f2b6037c
patch of mdstore records
2021-05-31 11:36:26 +02:00
Michele Artini
ad56a44fda
save as gzipped sequence file
2021-05-28 14:45:39 +02:00
Claudio Atzori
6e3a4e9237
updated test expectations
2021-05-28 09:37:50 +02:00
Michele Artini
4fa5671d16
first implementation of Hdfs Mdstores Importer
2021-05-27 16:22:07 +02:00
Claudio Atzori
5e4b91d9ef
more pervasive use of constants from ModelConstants, especially for ORCID
2021-05-26 18:20:23 +02:00
Claudio Atzori
9d725efdc1
reverted implementation of the mdstore client
2021-05-20 18:26:09 +02:00
Claudio Atzori
ae5c28e54f
code formatting
2021-05-20 16:13:06 +02:00
Claudio Atzori
232dce83db
fixes #6701 : xpath for titles to support both datacite and Guidelines v4 mapping
2021-05-20 14:41:15 +02:00
Claudio Atzori
23b8883ab1
applied intellij code cleanup
2021-05-14 10:58:12 +02:00
Claudio Atzori
d4c3476152
mapping datasource.journal only when an issn is available, null otherwhise
2021-05-11 11:08:54 +02:00
Claudio Atzori
d1cbee8413
imported methods from CleaningFunctions, defined in GraphCleaningFunctions
2021-05-10 16:43:39 +02:00
Claudio Atzori
d4a30fabe3
clean up tests
2021-05-05 17:28:15 +02:00
Claudio Atzori
dccaf173cf
fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials
2021-05-05 16:36:15 +02:00
Claudio Atzori
2e1eb96f9a
code formatting
2021-05-05 11:23:57 +02:00
Claudio Atzori
fb930b84d3
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-05-04 18:06:30 +02:00
Claudio Atzori
923d19ea8e
mdstore read lock/unlock when bulk copying records from mongodb to hdfs
2021-05-04 18:06:21 +02:00
Sandro La Bruzzo
714b71bd21
updated pubmed
2021-05-04 14:54:12 +02:00
Alessia Bardi
9a20057615
fixed query for organisations' pids
2021-04-29 15:23:39 +02:00
Sandro La Bruzzo
2129e9caa7
updated pangaea transformation to parse directly the xml
2021-04-28 10:21:03 +02:00
Claudio Atzori
5afa7d3e0c
core utilities in dhp-common moved in external module dhp-schemas
2021-04-27 15:44:01 +02:00
Sandro La Bruzzo
74484d2823
bug fixing
2021-04-27 12:13:44 +02:00
Sandro La Bruzzo
c74b03d59c
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-27 11:31:07 +02:00
Sandro La Bruzzo
7f8848ecdd
added first implementation of Pangaea Mapping
2021-04-27 11:30:37 +02:00
Claudio Atzori
27ab8a704d
adjusted poms to align with the external dhp-schema module
2021-04-27 10:12:27 +02:00
Claudio Atzori
c2bb03c8b5
depending on external dhp-schemas module
2021-04-23 17:57:35 +02:00
Claudio Atzori
c25238480c
making ODF record parsing namespace unaware ( #6629 )
2021-04-23 17:34:57 +02:00
Claudio Atzori
d0d477cca3
code formatting
2021-04-20 12:50:34 +02:00
miconis
0393cdce42
addition of alternative names in export queries
2021-04-20 12:45:21 +02:00
miconis
cadd0a5de8
modification of the queries for openorgs: they now consider also pending orgs
2021-04-20 12:06:56 +02:00
Claudio Atzori
d1ca025b0b
[cleaning] remiving authors without fullname or providing 'deactivated' keyword. Removing test test titles
2021-04-13 14:32:41 +02:00
miconis
11b22b2d23
bug fix in the query, it now exports only relations with non-hidden organizations
2021-04-08 11:51:47 +02:00
miconis
0857100fb8
implementation of the tests for the openorgs integration in the openaire provision
2021-04-07 18:42:16 +02:00
miconis
bf685d849f
addition of pids in the query for the export of openorgs for the provision, addition of ec_fields in the openorgs model
2021-04-07 14:27:43 +02:00
miconis
eaaefb8b4c
implementation of the procedure to reuse content of different dbs when creating the raw graph
2021-04-06 14:35:51 +02:00
miconis
c39c82dfe9
modification of the jobs for the integration of openorgs in the provision, dedup records are no more created by merging but simply taking results of openorgs portal
2021-04-06 14:31:00 +02:00
Claudio Atzori
7941d7be29
WIP: using common definitions from ModelConstants
2021-03-31 18:33:57 +02:00
Claudio Atzori
72ce741ea6
WIP: using common definitions from ModelConstants
2021-03-31 17:07:13 +02:00
Claudio Atzori
9237d55d7f
[OpenOrgsWf] cleanup
2021-03-29 17:40:34 +02:00
Claudio Atzori
7f4e9479ec
[OpenOrgsWf] graph construction wf: allow to skip the import openorgs node (importOpenorgs true|false)
2021-03-29 16:59:16 +02:00
miconis
2709d08fc2
Merge branch 'stable_ids' into openorgswf
2021-03-29 16:39:07 +02:00
miconis
f446580e9f
code refactoring (useless classes and wf removed), implementation of the test for the openorgs dedup
2021-03-29 16:10:46 +02:00
miconis
2355cc4e9b
minor changes and bug fix
2021-03-29 10:07:12 +02:00
Claudio Atzori
827e7e37db
[Cleaning] drop instance.alternateIdentifier elements when they are available among instance.pid
2021-03-25 11:07:59 +01:00
miconis
28c1cdd132
merged stable_ids into openorgswf
2021-03-25 10:44:49 +01:00
miconis
348b0ef921
bug fix, implementation of the workflow for the creation of raw_organizations (openorgs dedup), addition of the pid lists to the openorgs postgres db
2021-03-24 15:51:27 +01:00
Claudio Atzori
751125fdf9
[Actionmanager] zero function considers empty entity.id as well as rel.source/rel.target
2021-03-23 17:34:32 +01:00
Claudio Atzori
b4febed138
updated mapping tests as consequence of the special treatment reserved to Handle PIDs
2021-03-23 09:37:48 +01:00
Claudio Atzori
431cbe9955
handle missing instance.pid during bulk cleaning
2021-03-23 09:28:58 +01:00
Sandro La Bruzzo
c73072079d
fix conflicts
2021-03-22 16:36:31 +01:00
Claudio Atzori
5a043e95ea
code formatting
2021-03-19 11:37:27 +01:00
Claudio Atzori
a4e82a65aa
integrated filter applied when merging BETA & PROD graphs to rule our records from Datacite
2021-03-19 11:34:44 +01:00
Claudio Atzori
8257f9a2bc
result.pid: adjusted the mapping applied to the contents from the aggregator
2021-03-17 12:45:38 +01:00
Claudio Atzori
640b885706
added instance.alternativeIdentifiers to the graph model, adjusted the mapping applied to the contents from the aggregator
2021-03-16 14:19:32 +01:00
Claudio Atzori
01630f638d
IdentifierFactory implementation based on the list of datasources authoritative for a given pid type
2021-03-09 17:11:50 +01:00
Claudio Atzori
59532b0919
[ #6281 Provenance of product PIDs] Added PIDs to the Instance type; extended mapping for OAF/ODF records
2021-03-09 11:14:45 +01:00
Claudio Atzori
d525785497
[ #6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color.
2021-03-09 11:12:55 +01:00
Claudio Atzori
f468c7f0d7
merged from master
2021-03-09 09:12:41 +01:00
Claudio Atzori
8d2bb24512
merged from master
2021-03-08 15:44:34 +01:00
Claudio Atzori
fa7930d2e2
merging contributions from PR#97
2021-03-05 15:45:28 +01:00
miconis
1a85020572
bug fix in graph-mapper, changes in the implementation of the openorgs wf to create relations and populate openorgs db
2021-02-26 10:19:28 +01:00
Claudio Atzori
b830e33392
mdstore collector plugin
2021-02-25 12:30:30 +01:00
Claudio Atzori
fc3fa5e343
implemented mdstore collector plugin
2021-02-24 15:07:24 +01:00
miconis
4b2124a18e
implementation of the openorgs wfs, implementation of the raw_all wf to migrate openorgs db entities
2021-02-10 11:51:50 +01:00
Alessia Bardi
c4d1feca74
mapper test with validated link to project
2021-02-10 11:22:54 +01:00
Claudio Atzori
72c57b28fa
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
2021-02-04 14:08:18 +01:00
Alessia Bardi
c67329d3ad
updated test for EU Open Data portal datasets
2021-02-03 17:06:48 +01:00
Alessia Bardi
fd705404a1
tests for EU Open Data portal dataset mapping
2021-02-03 10:28:17 +01:00
Sandro La Bruzzo
686e7b507c
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into aggregation_on_hadoop
2021-01-28 10:02:13 +01:00
Sandro La Bruzzo
98b9498b57
Removed old messaging system not quite used from collection and Transformation workflow
...
code refactor
2021-01-28 09:51:17 +01:00
Sandro La Bruzzo
150a617bd1
Merge pull request 'aggregation_on_hadoop' ( #90 ) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
...
Wonderfull code... You're the Best Sandro
2021-01-26 16:00:47 +01:00
Claudio Atzori
885e0dd926
[Cleaning] filter authors not providing word characters in the fullname
2021-01-26 09:48:53 +01:00
Claudio Atzori
2890511613
[Cleaning] normalise missing Result.country
2021-01-26 09:41:44 +01:00
Claudio Atzori
4eb9ed35b1
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-01-25 18:12:24 +01:00
Claudio Atzori
cd379eb5e3
[Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname
2021-01-25 18:11:49 +01:00
Alessia Bardi
505477f36f
format code
2021-01-25 18:02:49 +01:00
Alessia Bardi
ded6ed8d7d
no ',' author, if there are no author in ODF records
2021-01-25 17:57:51 +01:00
Claudio Atzori
3465c8ccee
[Cleaning] trying to avoid NPEs
2021-01-25 16:54:53 +01:00
Sandro La Bruzzo
a54848a59c
Moved Vocabulary stuff to common module
2021-01-25 15:43:04 +01:00
Claudio Atzori
07a0ccfc96
[Cleaning] trying to avoid NPEs
2021-01-25 13:36:01 +01:00
Claudio Atzori
34d653de41
[Cleaning] updated cleaning rule for DOIs
2021-01-22 14:16:33 +01:00
Claudio Atzori
26e9d55c13
code formatting
2021-01-05 09:59:26 +01:00
Claudio Atzori
7185158942
ignore missing properties
2020-12-29 11:06:28 +01:00
Claudio Atzori
28460c2cd1
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
2020-12-23 16:59:52 +01:00
Claudio Atzori
723b01f9e9
trivial: the less magic numbers and values around, the better
2020-12-23 12:22:48 +01:00
Claudio Atzori
6cb0dc3f43
extended OCRID cleaning procedure
2020-12-21 11:40:17 +01:00
Claudio Atzori
47270d9af5
lenient mock can be lenient
2020-12-18 15:38:59 +01:00
Alessia Bardi
f9a8fd8bbd
updated test record for textgrid
2020-12-17 11:59:45 +01:00
Michele Artini
991e675dc6
validation in claim rels
2020-12-14 15:41:25 +01:00
Claudio Atzori
12e2f930c8
resolved conflicts
2020-12-10 10:57:39 +01:00
Alessia Bardi
112da6d76a
in theory, just auto-formatting after mvn compile
2020-12-09 20:00:27 +01:00
Alessia Bardi
bece04b330
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-12-09 19:54:43 +01:00
Alessia Bardi
426b76ee8e
more asserts for TextGrid record
2020-12-09 19:46:11 +01:00
Claudio Atzori
4705144918
Merge pull request 'rel_project_validation' ( #69 ) from rel_project_validation into master
...
LGTM
2020-12-09 19:01:20 +01:00
Claudio Atzori
ada21ad920
Merge pull request 'dump of the results related to at least one project' ( #61 ) from miriam.baglioni/dnet-hadoop:dump into master
...
LGTM
2020-12-09 17:22:56 +01:00
Michele Artini
1bc9adc10d
default trust for validated rels
2020-12-09 16:18:37 +01:00
Michele Artini
5f21a356fd
reindent
2020-12-09 11:24:30 +01:00
Michele Artini
370a5e650b
validation attributes in resultProject relations
2020-12-09 11:18:26 +01:00
Claudio Atzori
a104a632df
cleanup
2020-12-04 16:32:47 +01:00
Miriam Baglioni
5fb65ffc4a
merge branch with master
2020-12-03 11:24:35 +01:00
Miriam Baglioni
ea88dc3401
fixed issue in property name
2020-12-03 11:24:23 +01:00
Claudio Atzori
cfb55effd9
code formatting
2020-12-02 11:23:49 +01:00
Claudio Atzori
57f448b7a4
graph cleaning workflow separate orcid_pending from orcid, depending on the author pid provenance
2020-12-02 10:44:05 +01:00
Alessia Bardi
a417624670
tests for raw graph mapping
2020-12-02 10:15:26 +01:00
Claudio Atzori
893ac4a77b
GenerateEntitiesApplication can be configured to hash the id value or not
2020-12-02 09:30:06 +01:00
Claudio Atzori
2c407e775e
GenerateEntitiesApplication can be configured to hash the id value or not
2020-11-30 12:00:38 +01:00
Claudio Atzori
e731a7658d
cleaning texts to remove tab characters too
2020-11-27 09:00:04 +01:00
Claudio Atzori
c1b9a4045a
grouping of records will be performed by the dedup workflow
2020-11-26 10:59:10 +01:00
Miriam Baglioni
124591a7f3
refactoring
2020-11-25 18:23:28 +01:00
Miriam Baglioni
1a89f8211c
D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:12:40 +01:00
Miriam Baglioni
5fbe54ef54
D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:10:28 +01:00
Miriam Baglioni
ed01e5a5e1
D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:09:34 +01:00
Miriam Baglioni
d4ddde2ef2
changed because of D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:01:01 +01:00
Miriam Baglioni
f5e5e92a10
changed because of D-Net/dnet-hadoop#61 (comment)
2020-11-25 17:58:53 +01:00
Miriam Baglioni
1df94b85b4
changed because of D-Net/dnet-hadoop#61 (comment)
2020-11-25 17:57:43 +01:00
Claudio Atzori
dfd6205b95
Consistency graph workflow merges all the entities by ID
2020-11-25 14:55:32 +01:00
Miriam Baglioni
90d4369fd2
added test to verify the compression in writing community info on hdfs
2020-11-25 14:34:58 +01:00
Miriam Baglioni
6750e33d69
merge branch with master
2020-11-25 14:09:01 +01:00
Miriam Baglioni
b2c455f883
added java doc
2020-11-25 14:08:09 +01:00
Miriam Baglioni
1f130cdf92
changed the relation (produces -> isProducedBy) due to the change in the code
2020-11-25 14:04:26 +01:00
Miriam Baglioni
e758d5d9b4
refactoring
2020-11-25 13:46:39 +01:00
Miriam Baglioni
87a9f616ae
refactoring and addition of the funder nsp first part as nome for the dump insteasd of the whole nsp
2020-11-25 13:45:41 +01:00
Miriam Baglioni
e7e418e444
added decision node to verify if to upload in Zenodo
2020-11-25 13:44:10 +01:00
Miriam Baglioni
305e3d0c9c
added resource file for relation with relClass = isProducedBy
2020-11-25 13:43:41 +01:00
Miriam Baglioni
21ce175d17
added FilterFunction specification if filter operation
2020-11-25 13:42:31 +01:00
Miriam Baglioni
bde6d337dd
test classes for dump of results related to funders
2020-11-25 13:42:01 +01:00
Miriam Baglioni
b37b9352d7
added constant value for semantic relationship between projects and results
2020-11-25 13:41:08 +01:00
Claudio Atzori
36173c13a5
reverted filters in the clening process
2020-11-25 10:24:42 +01:00
Claudio Atzori
eeebd5a920
Cleanig workflow: remove newlines from titles, descriptions, subjects
2020-11-24 18:40:25 +01:00
Claudio Atzori
e1a1bb3ee4
moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects
2020-11-24 18:34:03 +01:00
Miriam Baglioni
72bb0fe360
changed directory name
2020-11-24 16:47:07 +01:00
Miriam Baglioni
39f4a20873
chenged the path and the name for saving the communities_infrastructures dump file
2020-11-24 14:47:32 +01:00
Miriam Baglioni
7e14452a87
final versione of the wf to get the dump of results associated to at least one funder per funder
2020-11-24 14:46:34 +01:00
Miriam Baglioni
c167a18057
added new parameter for the dumpType
2020-11-24 14:45:50 +01:00
Miriam Baglioni
54a309bb6b
refactoring
2020-11-24 14:45:30 +01:00
Miriam Baglioni
35ecea8842
changed to consider the modification for the specification of the type of dump
2020-11-24 14:45:15 +01:00
Miriam Baglioni
b9b6bdb2e6
fixing issue on previous implementation
2020-11-24 14:44:53 +01:00
Miriam Baglioni
7e940f1991
changed to consider the modification for the specification of the type of dump
2020-11-24 14:43:34 +01:00
Miriam Baglioni
62928ef7a5
changed to save the communities_infrastructures information as the other entity dumps: in a json.gz file
2020-11-24 14:42:41 +01:00
Claudio Atzori
33bae02451
reverted behaviour of the cleaning workflow: grouping entities by ID will be managed differently
2020-11-24 14:42:33 +01:00
Miriam Baglioni
3319440c53
changed the direction of the relation between projects and result considered to select the results linked to projects
2020-11-24 14:41:09 +01:00
Miriam Baglioni
00c377dac2
added specification of MapFunction types in map
2020-11-24 14:40:22 +01:00
Miriam Baglioni
44db258dc4
added enumerated for the dump type
2020-11-24 14:38:06 +01:00
Miriam Baglioni
1832708c42
modified boolean variable with string one whcih specify the type of dump we are performing: complete, community or funder
2020-11-24 14:37:36 +01:00
Miriam Baglioni
259c67ce36
fixed issue in path name
2020-11-20 12:32:23 +01:00
Miriam Baglioni
0a9db67eec
-
2020-11-20 12:21:33 +01:00
Miriam Baglioni
d362f2637d
merge branch with master
2020-11-19 19:17:20 +01:00
Miriam Baglioni
cf3f47563f
new parameter files
2020-11-19 19:16:05 +01:00
Miriam Baglioni
24c56fa7a3
new logic and workflow for dump of results with link to projects. In this implementation the result match the model of the communityresult.
2020-11-19 19:15:39 +01:00
Claudio Atzori
fcbb05eb21
cleanup
2020-11-19 15:14:33 +01:00
Claudio Atzori
3f34757c63
merged from master
2020-11-19 14:34:54 +01:00
Miriam Baglioni
fafb688887
-
2020-11-18 18:56:48 +01:00
Miriam Baglioni
906db690d2
-
2020-11-18 17:43:08 +01:00
Claudio Atzori
ede7fae6c8
Merge pull request 'XML record indexing test' ( #58 ) from provision_indexing into master
2020-11-18 17:04:34 +01:00
Miriam Baglioni
5402062ff5
changed parameter file with the ono associated to the job
2020-11-18 16:58:20 +01:00
Miriam Baglioni
a172a37ad1
fixed typo
2020-11-18 16:55:07 +01:00
Miriam Baglioni
46ba3793f6
code, workflow and parameters for the dump of the results associated to funders
2020-11-18 16:47:31 +01:00
Miriam Baglioni
57cac36898
changed the workflow name
2020-11-18 13:38:03 +01:00
Claudio Atzori
8177ce7939
test for XmlIndexingJob based on a local miniSolrCluster
2020-11-18 10:58:05 +01:00
Alessia Bardi
10e673660f
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-11-18 10:01:23 +01:00
Alessia Bardi
be7b310cef
rel semantcis ignore case
2020-11-18 10:01:20 +01:00
Michele Artini
33da2e3d6c
xpaths for dateOfCollection and dateOfTransformation
2020-11-18 09:26:20 +01:00
Alessia Bardi
8f87020a50
#56 : map relevantDates from aggregated ODF records
2020-11-17 18:42:09 +01:00
Alessia Bardi
7e0a76a8ac
test fr TextGrid
2020-11-17 18:39:25 +01:00
Claudio Atzori
cfc01f136e
PID filtering based on a blacklist
2020-11-17 12:27:06 +01:00
Claudio Atzori
6ab1ce53c9
fixed condition in result pid cleaning; cleanup
2020-11-16 10:09:17 +01:00
Claudio Atzori
4de8c8b237
fixed workflow variable name
2020-11-16 10:03:11 +01:00
Claudio Atzori
331d621800
added test resource
2020-11-14 12:16:15 +01:00
Claudio Atzori
5d4e34e26a
fixed typo in variable name
2020-11-14 10:32:26 +01:00
Claudio Atzori
768bc5304c
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-11-13 15:40:34 +01:00
Claudio Atzori
93f7b7974f
Merge pull request 'trust truncated to 3 decimals' ( #24 ) from trunc_trust into master
...
LGTM
2020-11-13 15:40:02 +01:00
Claudio Atzori
528231a287
grouping graph entities by id turned out to be an easy extension for the already existing cleaning workflow
2020-11-13 15:37:48 +01:00
Claudio Atzori
2bed29eb09
WIP: added oozie workflow for grouping graph entities by id
2020-11-13 10:05:12 +01:00
Claudio Atzori
13e36a4da0
WIP: added oozie workflow for grouping graph entities by id
2020-11-13 10:05:02 +01:00
Claudio Atzori
9b0fb9e958
merged from master
2020-11-12 09:27:12 +01:00
Michele Artini
40160d171f
organizations pids
2020-11-09 12:58:36 +01:00
Sandro La Bruzzo
027ef2326c
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-11-06 17:12:42 +01:00
Sandro La Bruzzo
cd27df91a1
fixed bug on missing relation in ANDS
2020-11-06 17:12:31 +01:00
Claudio Atzori
d10447e747
re-packaged graph dump workflow sources
2020-11-05 17:38:18 +01:00
Claudio Atzori
2d76497488
cleanup
2020-11-05 17:10:24 +01:00
Miriam Baglioni
f8e9bda24c
merge branch with master
2020-11-05 16:31:18 +01:00
Miriam Baglioni
be5ed8f554
added check to avoid sending empty metadata.
2020-11-05 16:10:17 +01:00
Claudio Atzori
2148a51fae
minor changes
2020-11-05 11:24:12 +01:00
Claudio Atzori
4625b7486e
code formatting
2020-11-04 18:12:43 +01:00
Miriam Baglioni
e9ac471ae9
removed dependency from classes for the pid graph dump
2020-11-04 18:04:42 +01:00
Miriam Baglioni
b90a945c49
removed property files for pid graph dump
2020-11-04 17:28:33 +01:00
Miriam Baglioni
bac307155a
removed properties specific for pid graph dump
2020-11-04 17:28:04 +01:00
Miriam Baglioni
9c9d50f486
removed code specific for pid graph dump
2020-11-04 17:26:22 +01:00
Miriam Baglioni
5669890934
removed commented lines
2020-11-04 17:15:21 +01:00
Miriam Baglioni
6a89f59be9
removed commented lines
2020-11-04 17:13:59 +01:00
Miriam Baglioni
56150d7e5e
removed all code related to the dump of pids graph
2020-11-04 17:13:12 +01:00
Miriam Baglioni
16c54a96f8
removed pid dump
2020-11-04 17:11:32 +01:00
Miriam Baglioni
0cac5436ff
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
2020-11-04 13:21:11 +01:00
Alessia Bardi
51808b5afd
Updated descriptions
2020-11-04 12:29:48 +01:00
Alessia Bardi
e6becf8659
Updated descriptions
2020-11-04 12:17:57 +01:00
Alessia Bardi
0abe0eee33
Updated descriptions
2020-11-04 12:15:30 +01:00
Alessia Bardi
f6ab238f5d
Updated descriptions
2020-11-04 11:50:47 +01:00
Miriam Baglioni
c010a8442f
fixed issue on test code
2020-11-03 17:26:51 +01:00
Miriam Baglioni
8ec7a61188
merge branch with master
2020-11-03 16:59:08 +01:00
Miriam Baglioni
c209284ca7
new schemas for the entities in the dump with added descriptions
2020-11-03 16:58:08 +01:00
Miriam Baglioni
08806deddf
added the splitSize non mandatory parameter. Default size 10G
2020-11-03 16:57:34 +01:00
Miriam Baglioni
7d2eda43ca
added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase
2020-11-03 16:57:01 +01:00
Miriam Baglioni
cbbb1bdc54
moved business logic to new class in common for handling the zip of hte archives
2020-11-03 16:55:50 +01:00
Miriam Baglioni
d4382b54df
moved the tar archive with maz size on common module
2020-11-03 16:54:50 +01:00
Claudio Atzori
86d6fbe95b
refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong
2020-11-03 12:19:46 +01:00
Claudio Atzori
8471888ad3
Merge branch 'graph_cleaning' into stable_ids
2020-11-03 11:52:47 +01:00
Claudio Atzori
5310e56dba
remove empy PIDs
2020-11-03 11:52:10 +01:00
Claudio Atzori
3fcd669e99
result merge operation leverage on custom ResultTypeComparator in the aggregator graph construction
2020-11-03 10:53:23 +01:00
Claudio Atzori
09e44dabff
Merge branch 'master' into stable_ids
2020-11-02 12:16:01 +01:00
Sandro La Bruzzo
754c86f33e
fixed test to work on jenkins
2020-11-02 09:35:01 +01:00
Miriam Baglioni
dabb33e018
changed the discriminant for which split the file
2020-10-30 17:52:22 +01:00
Miriam Baglioni
0fba08eae4
max allowed size per file 10 Gb
2020-10-30 16:05:55 +01:00
Claudio Atzori
4ca75d6951
Merge pull request 'Dedup ID creation policy' ( #48 ) from deduptesting into stable_ids
2020-10-30 15:15:32 +01:00
Miriam Baglioni
b828587252
prevent the code to cicle indefinetly
2020-10-30 15:01:25 +01:00
Miriam Baglioni
f747e303ac
classes for dumping of the graph as ttl file
2020-10-30 14:13:45 +01:00
Miriam Baglioni
16baf5b69e
formatting
2020-10-30 14:13:14 +01:00
Miriam Baglioni
a9eef9c852
added check for possible Optional value in relation dataInfo
2020-10-30 14:12:28 +01:00
Miriam Baglioni
5f4de9a962
formatting
2020-10-30 14:11:40 +01:00
Miriam Baglioni
14bf2e7238
added option to split dumps bigger that 40Gb on different files
2020-10-30 14:09:04 +01:00
Claudio Atzori
58f28296ea
ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values
2020-10-30 10:56:42 +01:00
Miriam Baglioni
78fdb11c3f
merge branch with master
2020-10-29 12:55:22 +01:00
Sandro La Bruzzo
1d9fdb7367
fixed spark memory issue in SparkSplitOafTODLIEntities
2020-10-28 12:30:32 +01:00
Miriam Baglioni
d2374e3b9e
added code to handle cases where the funding tree is not existing
2020-10-27 16:15:21 +01:00
Miriam Baglioni
5d3012eeb4
changed code to dump only the programme list and not the classification list
2020-10-27 16:14:18 +01:00
Miriam Baglioni
3241ec1777
added connection timeout and socket timeout 600 sec
2020-10-27 16:12:11 +01:00
Alessia Bardi
1425d810a8
testing mapping
2020-10-19 17:46:14 +02:00
Claudio Atzori
266bf1a221
common IdentifierFactory in use on the mapping from the aggregator data; merge the entities sharing the same id; code formatting
2020-10-16 17:02:10 +02:00
Claudio Atzori
34f1d0904b
common IdentifierFactory in use on the mapping from the aggregator data
2020-10-16 16:00:19 +02:00
Sandro La Bruzzo
fed711da80
Merge remote-tracking branch 'origin/master' into merge_record_to_common
2020-10-13 15:32:45 +02:00
Alessia Bardi
8775a64bc1
Merge pull request 'Merging different compatibility levels (pinocchio operator)' ( #47 ) from merge_graph into master
2020-10-09 14:44:52 +02:00
Claudio Atzori
e751c1402f
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-10-09 13:53:21 +02:00
Claudio Atzori
b961dc7d1e
added originalid to the fields in the result graph view
2020-10-09 13:53:15 +02:00
Sandro La Bruzzo
eec418cd26
moved AuthoreMerger into dhp-common
2020-10-08 10:33:55 +02:00
Sandro La Bruzzo
fe0a7870e6
Added test to check if merge authors works
2020-10-08 10:33:12 +02:00
Sandro La Bruzzo
cd9c377d18
adpted scholexplorer Dump generation to the new Dataset definition
2020-10-08 10:10:13 +02:00
Claudio Atzori
a3f37a9414
javadoc
2020-10-07 16:44:22 +02:00
Claudio Atzori
8d85a2fced
[BETA wf only] datasources involved in the merge operation doesn't obey to the infra precedence policy, but relies on a custom behaviour that, given two datasources from beta and prod returns the one from prod with the highest compatibility among the two
2020-10-07 16:28:52 +02:00
Miriam Baglioni
ae08b3c0dd
merge branch with master
2020-10-05 11:35:55 +02:00
Miriam Baglioni
11b7eaae09
changed the name of the folder where to store the context entity from context to communities_infrastructures
2020-10-05 11:24:54 +02:00
Miriam Baglioni
32bffb0134
changed the name from communities_infrastructures to communities_infrastuctures.json
2020-10-05 11:24:17 +02:00
Miriam Baglioni
25cbcf6114
changed to solve issues about names. context renamed communities_infrastructure.json and removed the double json.gz extention to the name of the part in the tar
2020-10-02 12:17:46 +02:00
Claudio Atzori
49ae3450a9
code formatting
2020-10-02 09:43:24 +02:00
Claudio Atzori
c2a6e2a9bf
fixed mapping for datasource journal info (ISSNs)
2020-10-02 09:37:08 +02:00
Miriam Baglioni
01117a46e1
whole workflow activated
2020-10-01 17:19:21 +02:00
Miriam Baglioni
cfb5766c6b
removed double json.gz from names of files in the tar
2020-10-01 17:18:34 +02:00
Miriam Baglioni
fcaedac980
merge branch with master
2020-10-01 16:46:59 +02:00
Miriam Baglioni
c6e6ed1bd8
merge branch with master
2020-10-01 16:24:41 +02:00
Claudio Atzori
2e9e13444d
author pids made unique by value
2020-10-01 12:50:40 +02:00
Claudio Atzori
e265c3e125
cleaning functions factored out in a dedicated class
2020-10-01 10:50:15 +02:00
Claudio Atzori
4287164aba
include relevantdate field in the result view
2020-10-01 10:28:55 +02:00
Miriam Baglioni
7b6a7333e6
merge branch with master
2020-09-25 16:42:07 +02:00
Miriam Baglioni
983a12ed15
temporary modification to allow the upload of files in the sandbox without the neew to recreate the mapping from scratch
2020-09-25 16:41:51 +02:00
Miriam Baglioni
8b36d19182
added property depositionId and chenage property newVersion that became string from boolean to handle the three possible distinct values
2020-09-25 16:41:15 +02:00
Miriam Baglioni
ed5239f9ec
added new code to handle the new possibility to upload files to an already open deposition
2020-09-25 16:34:32 +02:00
Miriam Baglioni
3a8c524fce
refactor
2020-09-25 16:34:02 +02:00
Miriam Baglioni
54800fb9b0
enabled only the step to upload in zenodo
2020-09-25 14:40:22 +02:00
Miriam Baglioni
de6c4d46d8
fixed conflicts
2020-09-24 15:35:01 +02:00
Claudio Atzori
044d3a0214
fixed query used to load datasources in the Graph
2020-09-24 13:48:58 +02:00
Claudio Atzori
27df1cea6d
code formatting
2020-09-24 12:16:00 +02:00
Claudio Atzori
fb22f4d70b
included values for projects fundedamount and totalcost fields in the mapping tests. Swapped expected and actual values in junit test assertions
2020-09-24 12:10:59 +02:00
Claudio Atzori
42f55395c8
fixed order of the ISSNs returned by the SQL query
2020-09-24 12:09:58 +02:00
Claudio Atzori
9a7e72d528
using concat_ws to join textual columns from PSQL. When using || to perform the concatenation, Null columns makes the operation result to be Null
2020-09-24 10:42:47 +02:00
Claudio Atzori
9e3e93c6b6
setting the correct issn type in the datasource.journal element
2020-09-24 10:39:16 +02:00
Miriam Baglioni
39eb8ab25b
changed the dump to move from h2020programme to h2020classification
2020-09-23 17:33:00 +02:00
Miriam Baglioni
c2b5c780ff
-
2020-09-14 14:34:03 +02:00
Miriam Baglioni
e2ceefe9be
-
2020-09-14 14:33:28 +02:00
Miriam Baglioni
1f893e63dc
-
2020-09-14 14:33:10 +02:00
Claudio Atzori
8a523474b7
code formatting
2020-09-07 11:40:16 +02:00
Miriam Baglioni
b72a7dad46
resuorce for pid graph dump
2020-08-24 17:09:01 +02:00
Miriam Baglioni
8694bb9b31
refactoring due to compilation
2020-08-24 17:07:34 +02:00
Miriam Baglioni
8a069a4fea
-
2020-08-24 17:01:30 +02:00
Miriam Baglioni
34fa96f3b1
-
2020-08-24 17:00:20 +02:00
Miriam Baglioni
5fb2949cb8
added utils methods
2020-08-24 17:00:09 +02:00
Miriam Baglioni
2a540b6c01
added constants for the pid graph dump
2020-08-24 16:55:35 +02:00
Miriam Baglioni
da103c399a
resources for the pid graph dump test
2020-08-24 16:52:07 +02:00
Miriam Baglioni
630a6a1fe7
first tests for the pid graph dump
2020-08-24 16:51:26 +02:00
Miriam Baglioni
40c8d2de7b
test resources for the dump of the pids graph
2020-08-24 16:50:39 +02:00
Miriam Baglioni
bef79d3bdf
first attempt to the dump of pids graph
2020-08-24 16:49:38 +02:00
Miriam Baglioni
85203c16e3
merge branch with master
2020-08-19 11:49:03 +02:00
Miriam Baglioni
2c783793ba
removed the affiliation from the author to mirror the changes in the model
2020-08-19 11:48:12 +02:00
Miriam Baglioni
f6bf888016
removed affiliation from author to mirror the changes in the model
2020-08-19 11:41:41 +02:00
Miriam Baglioni
66d0e0d3f2
-
2020-08-19 11:31:50 +02:00
Miriam Baglioni
1c593a9cfe
-
2020-08-19 11:29:51 +02:00
Miriam Baglioni
e42b2f5ae2
-
2020-08-19 11:29:09 +02:00
Miriam Baglioni
f81ee22418
changed to mirror the changes in the model (Instance, CommunityInstance, GraphResult)
2020-08-19 11:28:26 +02:00
Miriam Baglioni
387be43fd4
changed to discriminate if dumping all the results type together or each one in its own archive
2020-08-19 11:25:27 +02:00
Miriam Baglioni
c5858afb88
added parameter to guide the dump for the result (resultAggregation). true if all the result types should be dump together, false otherwise.
2020-08-19 11:24:14 +02:00
Miriam Baglioni
d407852ac2
changed to reflect the changed in the model
2020-08-19 11:15:05 +02:00
Miriam Baglioni
47c21a8961
refactoring due to compilation
2020-08-19 11:11:57 +02:00
Miriam Baglioni
5570678c65
changed parameter name from hfdsNameNode to nameNode
2020-08-19 10:59:26 +02:00
Miriam Baglioni
dc5096a327
refactoring due to compilation
2020-08-19 10:57:36 +02:00
Miriam Baglioni
96600ed04a
modified test resource for mirroring the deletion of affiliation from author parameters
2020-08-14 20:41:49 +02:00
Miriam Baglioni
09f5b92763
added specific reference to class
2020-08-14 20:00:09 +02:00
Miriam Baglioni
37e7c43652
changed parameter name from hdfsNaemNode to nameNode
2020-08-14 18:18:25 +02:00
Miriam Baglioni
d2a8a4961a
refactoring
2020-08-13 18:50:33 +02:00
Miriam Baglioni
a5043de5da
added method to get the mapped instance
2020-08-13 18:45:50 +02:00
Miriam Baglioni
fcd10f452c
changed because of D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:55:32 +02:00
Miriam Baglioni
fd48ae3b85
changed because of D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:19:15 +02:00
Miriam Baglioni
04a3e1ab38
disabled tests
2020-08-13 12:18:13 +02:00
Miriam Baglioni
2ede397933
Apply change because of D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:16:39 +02:00
Miriam Baglioni
bfd1fcde6d
removed not useful method and changed because of D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:14:37 +02:00
Miriam Baglioni
7fd8397123
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:13:15 +02:00
Miriam Baglioni
753d448cc9
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:12:58 +02:00
Miriam Baglioni
c0e071fa26
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:12:40 +02:00
Miriam Baglioni
526db915bc
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:12:16 +02:00
Miriam Baglioni
b0fab0d138
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:11:57 +02:00
Miriam Baglioni
1b6320b251
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:11:41 +02:00
Miriam Baglioni
743d31be22
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:11:22 +02:00
Miriam Baglioni
65b48df652
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:11:06 +02:00
Miriam Baglioni
90b54d3efb
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:08:24 +02:00
Miriam Baglioni
69bbb9592a
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:07:39 +02:00
Miriam Baglioni
945323299a
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:07:24 +02:00
Miriam Baglioni
e04c993247
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:07:07 +02:00
Miriam Baglioni
ed0812d0ce
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:06:49 +02:00
Miriam Baglioni
d55cfe0ea5
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:06:20 +02:00
Miriam Baglioni
80866bec7d
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:06:05 +02:00
Miriam Baglioni
1400978c0a
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:05:44 +02:00
Miriam Baglioni
7b941a2e0a
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:05:17 +02:00
Miriam Baglioni
f7474f50fe
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:04:52 +02:00
Miriam Baglioni
367203f412
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:04:33 +02:00
Miriam Baglioni
3ab4809d31
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:04:10 +02:00
Miriam Baglioni
02a4986e7b
Applying changed from code reviews D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment)
2020-08-13 11:53:01 +02:00
Miriam Baglioni
235d4e4d6e
moved Context as relevant for Communities dump
2020-08-12 18:16:45 +02:00
Miriam Baglioni
adf9f96a67
test for extraction of relation between organizations and context
2020-08-12 10:04:47 +02:00
Miriam Baglioni
7400cd019d
removed not needed variable
2020-08-12 10:03:33 +02:00
Miriam Baglioni
98d28bab5c
fixed missing _ in context nsprefix
2020-08-12 10:00:18 +02:00
Miriam Baglioni
25f4fbceea
draft of test and resources
2020-08-11 17:37:22 +02:00
Miriam Baglioni
30a2b19b65
changed metadata for deposition od covid-19 dump in Zenodo
2020-08-11 17:36:56 +02:00
Miriam Baglioni
49788b532a
changed to mirror changes in the schema
2020-08-11 16:05:03 +02:00
Miriam Baglioni
b08511287b
-
2020-08-11 16:01:36 +02:00
Miriam Baglioni
7e81a17068
changed the XQUERY to mirror the change in the code
2020-08-11 16:00:33 +02:00
Miriam Baglioni
37ad2f28e9
removed added | in prefix for datasource
2020-08-11 15:55:06 +02:00
Miriam Baglioni
f31c2e9461
enabled test
2020-08-11 15:49:25 +02:00
Miriam Baglioni
2d67476417
merge branch with master
2020-08-11 15:46:04 +02:00
Miriam Baglioni
6d3804e24c
-
2020-08-11 15:45:12 +02:00
Miriam Baglioni
0603ec4757
changed test to upload the dump for covid-19 community
2020-08-11 15:43:25 +02:00
Miriam Baglioni
7dfd56df9d
-
2020-08-11 15:42:35 +02:00
Miriam Baglioni
a169d7e7c1
added test file for the MakeTar class
2020-08-11 15:40:41 +02:00
Miriam Baglioni
acb0926b2e
json schemas for the dumped entities and relation
2020-08-11 15:39:48 +02:00
Miriam Baglioni
ff52c51f92
added the communityMapPath parameter and removed the isLookUpUrl parameter
2020-08-11 15:39:22 +02:00
Miriam Baglioni
6f43acda5e
added the maketar and send to zenodo step. Adjusted wf parameters
2020-08-11 15:38:20 +02:00
Miriam Baglioni
ddc19de2e9
removed the isLookUpUrl among the parameters
2020-08-11 15:37:47 +02:00
Miriam Baglioni
592a8ea573
added parameter file for maketar class
2020-08-11 15:37:14 +02:00
Miriam Baglioni
77a0951b32
added the make archive step in the workflow
2020-08-11 15:32:32 +02:00
Miriam Baglioni
cf4d918787
added description, changed parameter name and added method
2020-08-11 15:27:31 +02:00
Miriam Baglioni
dc5fc5366d
Creation of an archive for each related dump part
2020-08-11 15:26:06 +02:00
Miriam Baglioni
0ce49049d6
added description
2020-08-11 15:25:11 +02:00
Miriam Baglioni
9bae991167
added description of the class
2020-08-11 11:20:43 +02:00
Miriam Baglioni
341dc59ead
removed the repartition(1). Added code for the creation of an archive containing all the parts dumped for each community
2020-08-11 11:18:58 +02:00
Miriam Baglioni
1991a49f70
removed reference to isLookUp to get the communityMap
2020-08-10 18:02:56 +02:00
Miriam Baglioni
c378c38546
disabled test. The testing functionalities for hte upload in Zenode are moved to common
2020-08-10 12:41:11 +02:00
Miriam Baglioni
63ad0ed209
changed to use communityMapPath instead of IsLookUp
2020-08-10 12:40:19 +02:00
Miriam Baglioni
cec795f2ea
changed resources to mirror changes in the model
2020-08-10 12:39:35 +02:00
Miriam Baglioni
f50e3e7333
changed the class for which to generate the schema
2020-08-10 12:03:49 +02:00
Miriam Baglioni
b8c26f656c
test using communityMapPath instead of isLookUp
2020-08-10 12:02:55 +02:00
Miriam Baglioni
fe88904df0
changed the wf definition
2020-08-10 12:01:14 +02:00
Miriam Baglioni
87856467e2
removed isLookUpUrl and added code to read from HDSF the communitymap
2020-08-10 11:38:41 +02:00
Miriam Baglioni
1cf7043e26
removed isLookUoUrl from the parameters
2020-08-10 11:38:03 +02:00
Sandro La Bruzzo
0ade33ad15
updated mergeFrom function for DLI Unknown
2020-08-10 10:18:35 +02:00
Miriam Baglioni
46986aae2d
added the new parameter for newdeposion/newversion and concept_record_id
2020-08-07 18:00:06 +02:00
Miriam Baglioni
3aedfdf0d6
added option to do a new deposition or new version of an old deposition
2020-08-07 17:49:14 +02:00
Miriam Baglioni
1b3ad1bce6
filter out authors pid (only orcid). Added check to get unique provenance for context id. filtr out countries with code UNKNOWN
2020-08-07 17:48:18 +02:00
Miriam Baglioni
5ceb8c5f0a
moved constants from graph.Constants
2020-08-07 17:46:47 +02:00
Miriam Baglioni
6c65c93c0e
refactoring
2020-08-07 17:45:35 +02:00
Miriam Baglioni
68adf86fe4
refactoring
2020-08-07 17:43:20 +02:00
Miriam Baglioni
26d2ad6ebb
refactoring
2020-08-07 17:41:56 +02:00
Miriam Baglioni
9675af7965
refactoring
2020-08-07 17:41:07 +02:00
Miriam Baglioni
346a91f4d9
Added constants
2020-08-07 17:35:39 +02:00
Miriam Baglioni
d52b0e1797
no use of IsLookUp. The query is done once and its result stored on HDFS. The path to the result is given instead of the isLookUpUrl
2020-08-07 17:34:40 +02:00
Miriam Baglioni
ae1b7fbfdb
changed method signature from set of mapkey entries to String representing path on file system where to find the map
2020-08-07 17:32:27 +02:00
Miriam Baglioni
931fa2ff00
removed dependencies
2020-08-07 16:46:37 +02:00
Miriam Baglioni
545ea9f77e
moved in common. Zenodo response model and APIClient to deposit in Zenodo
2020-08-07 16:44:51 +02:00
Miriam Baglioni
da9b012c15
fixed dewcription
2020-08-06 11:55:44 +02:00
Miriam Baglioni
6dbadcf181
the new schema for the dumped result
2020-08-06 11:05:56 +02:00
Sandro La Bruzzo
4fb1821fab
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-08-06 10:28:31 +02:00
Sandro La Bruzzo
9d9e9edbd2
improved extractEntity Relation workflows using dataset
2020-08-06 10:28:24 +02:00
Miriam Baglioni
adf0ca5aa7
test to send is from hdfs
2020-08-05 14:24:43 +02:00
Miriam Baglioni
14eda4f46e
added method to try to put inputstream to zenodo
2020-08-05 14:18:25 +02:00
Miriam Baglioni
e737a47270
added classes to try to send input stream to zenodo for the upload
2020-08-05 14:17:40 +02:00
Miriam Baglioni
873e9cd50c
changed hadoop setting to connect to s3
2020-08-04 15:37:25 +02:00
Alessia Bardi
a29565ff57
code formatting
2020-08-04 12:55:27 +02:00
Alessia Bardi
01db29e208
fixes redmine issue #5846 : datacite and its different namespace declarations
2020-08-04 12:53:48 +02:00
Alessia Bardi
b4e4e5f858
do not duplicate result PIDs
2020-08-04 12:52:14 +02:00
Alessia Bardi
09a323d18d
testing a dataset from Nakala
2020-08-04 12:50:52 +02:00
Alessia Bardi
c35bf486cc
added handle among the possible PIDs
2020-08-04 12:50:12 +02:00
Miriam Baglioni
5b651abf82
merge branch with master
2020-08-04 10:14:07 +02:00
Miriam Baglioni
901ae37f7b
added step to workflow
2020-08-03 18:12:54 +02:00
Miriam Baglioni
fa38cdb10b
added resource
2020-08-03 18:11:12 +02:00
Miriam Baglioni
e9fcc0b2f1
commented test unit - to decide change for mirroring the changed logics
2020-08-03 18:10:53 +02:00
Miriam Baglioni
e43aeb139a
added new property file and changed some parameter to old files
2020-08-03 18:07:28 +02:00
Miriam Baglioni
aa9f3d9698
changed logic for save in s3 directly
2020-08-03 18:06:18 +02:00
Miriam Baglioni
d465f0eec9
added fulltext to result
2020-08-03 18:03:27 +02:00
Miriam Baglioni
ec4b392d12
added new dependencies for writing on s3
2020-08-03 17:57:04 +02:00
Miriam Baglioni
c892c7dfa7
changed to query for community map just once and save the result for remaining executions
2020-08-03 17:56:31 +02:00
Alessia Bardi
8cc067fe76
specific test for claims
2020-08-03 11:17:50 +02:00
Michele Artini
652b13abb6
Merge branch 'master' into nsprefix_blacklist
2020-07-31 07:58:37 +02:00
Claudio Atzori
cd631bb5bc
defaults fixed in the cleaning workflow forces result.publisher to NULL when result.publisher.value in empty
2020-07-30 17:03:53 +02:00
Miriam Baglioni
872d7783fc
-
2020-07-30 16:50:36 +02:00
Miriam Baglioni
57c87b7653
re-implemented to fix issue on not serializable Set<String> variable
2020-07-30 16:43:43 +02:00
Miriam Baglioni
ef8e5957b5
added specific directory where to save results
2020-07-30 16:42:46 +02:00
Miriam Baglioni
75f3361c85
-
2020-07-30 16:41:31 +02:00
Miriam Baglioni
3f695b25fa
refactoring
2020-07-30 16:40:15 +02:00
Miriam Baglioni
e623f12bef
refactoring
2020-07-30 16:32:59 +02:00
Miriam Baglioni
ff7d05abb4
added support class to store the couple organizationId representativeId gaot from sql query on hive
2020-07-30 16:32:04 +02:00
Miriam Baglioni
cf6d80b2ab
added command to close the writer
2020-07-30 16:31:22 +02:00
Miriam Baglioni
f985bca37b
added USER_CLAIM constant value
2020-07-30 16:25:26 +02:00
Claudio Atzori
4bbfcf1ac6
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-07-30 16:25:06 +02:00
Claudio Atzori
4ff8007518
added function to set the missing vocabulary names, used in the cleaning workflow as a pre-cleaning step
2020-07-30 16:24:39 +02:00
Miriam Baglioni
6f1c40a933
-
2020-07-30 16:24:28 +02:00
Miriam Baglioni
2b66a93f9e
added property file that was missing
2020-07-30 16:24:17 +02:00
Michele Artini
bdece15ca0
blacklist of nsprefix
2020-07-30 16:13:38 +02:00
Sandro La Bruzzo
c97c8f0c44
implemented new oozie job to extract entities in a separate dataset
2020-07-30 12:13:58 +02:00
Sandro La Bruzzo
3010a362bc
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
2020-07-30 09:25:56 +02:00
Sandro La Bruzzo
487226f669
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-30 09:25:39 +02:00
Sandro La Bruzzo
16ae3c9ccf
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
2020-07-30 09:25:32 +02:00
Miriam Baglioni
ee8420c6b3
added resource for datasource test
2020-07-29 18:28:43 +02:00
Miriam Baglioni
76bcab98ce
added code to filter out null originalId from the dump
2020-07-29 18:28:21 +02:00
Miriam Baglioni
ef1d8aef17
added one test to verify the dump for the datasources
2020-07-29 18:27:46 +02:00
Miriam Baglioni
86bab79512
-
2020-07-29 18:20:22 +02:00
Miriam Baglioni
31791dcf3d
fixed wrong property file path name
2020-07-29 18:20:08 +02:00
Miriam Baglioni
9e722aa1ef
-
2020-07-29 18:00:08 +02:00
Miriam Baglioni
d22f106f27
added constant to identify datasource associated to funders
2020-07-29 17:56:55 +02:00
Miriam Baglioni
40e194fe2f
added check to not dump datasources related to funders
2020-07-29 17:56:18 +02:00
Miriam Baglioni
b48934f6df
changed the workflow name
2020-07-29 17:43:43 +02:00
Miriam Baglioni
1433db825d
refactorign
2020-07-29 17:43:24 +02:00
Miriam Baglioni
074e9ab75e
refactoring
2020-07-29 17:42:50 +02:00
Miriam Baglioni
8ad8dac7d4
merge branch with fork master
2020-07-29 17:38:28 +02:00
Miriam Baglioni
9fa82dc93b
fixed issue
2020-07-29 17:36:16 +02:00
Miriam Baglioni
8907648d6a
-
2020-07-29 17:35:47 +02:00
Miriam Baglioni
536e7f6352
added and changed resources for testing of the whole graph dump and of community related products dumps
2020-07-29 17:33:34 +02:00
Miriam Baglioni
4d7f590493
testings for the whole graph dump
2020-07-29 17:32:37 +02:00
Miriam Baglioni
a2f73ec2c7
changed due to changes in the model
2020-07-29 17:32:02 +02:00
Miriam Baglioni
481585e9d3
-
2020-07-29 17:31:41 +02:00
Miriam Baglioni
40a8dafbdc
-
2020-07-29 17:30:44 +02:00
Miriam Baglioni
de2ebb467e
changed due to changes in the model
2020-07-29 17:08:02 +02:00
Miriam Baglioni
d0ff2a56fb
-
2020-07-29 17:06:53 +02:00
Miriam Baglioni
b96dedb56b
changed due to changes in the model
2020-07-29 17:05:31 +02:00
Miriam Baglioni
6d0f08277b
classes to implement the dump of the whole graph.
2020-07-29 17:03:19 +02:00
Miriam Baglioni
8d4327b292
input parameters and workflow definition for the dump of the whole graph
2020-07-29 17:00:34 +02:00
Miriam Baglioni
b5f995ab12
refactoring
2020-07-29 16:59:48 +02:00
Miriam Baglioni
f7a87cc447
added new constants value
2020-07-29 16:58:40 +02:00
Miriam Baglioni
b71d12cf26
refactoring
2020-07-29 16:52:44 +02:00
Miriam Baglioni
a8d65b68cb
changed to delete the part to check if it was a test or a real execution
2020-07-29 16:47:57 +02:00
Miriam Baglioni
3ec2392904
Added new class to move the place the split is effectively run
2020-07-29 16:46:50 +02:00
Miriam Baglioni
178c2729a7
changed the path to reach the java class to be executed
2020-07-29 12:29:51 +02:00
Miriam Baglioni
437ac12139
removed unused parameter
2020-07-29 12:28:16 +02:00
Michele Artini
35e6e9c064
tests
2020-07-28 12:02:15 +02:00
Miriam Baglioni
6c2223d1fc
added code to get the openaire id for contexts
2020-07-24 17:30:15 +02:00
Miriam Baglioni
afd54c1684
removed not needed upload and refactoring
2020-07-24 17:28:56 +02:00
Miriam Baglioni
7b0569d989
changed to map also the result associated to the whole graph
2020-07-24 17:28:11 +02:00
Miriam Baglioni
082225ad61
-
2020-07-24 17:27:26 +02:00
Miriam Baglioni
968c59d97a
added teh logic to dump also the products for the whole graph. They will miss collected from and context information that will be materialized as new relations
2020-07-24 17:25:19 +02:00
Miriam Baglioni
332258d199
split the classes related to the communities dump and to the whole graph dump
2020-07-24 17:21:48 +02:00
Claudio Atzori
56bbfdc65d
introduced parameter 'numParitions', driving the hive DB table data partitioning. Currently specified only for table 'project'
2020-07-23 08:54:10 +02:00
Sandro La Bruzzo
9ab594ccf6
fixed test
2020-07-21 10:36:21 +02:00
Claudio Atzori
ebf60020ac
map results as OPRs in case of missing //CobjCategory/@type and the vocabulary dnet:result_typologies doesn't resolve the super type
2020-07-20 19:01:10 +02:00
Miriam Baglioni
355d7e426e
added dumo for project - not finished
2020-07-20 18:54:43 +02:00
Miriam Baglioni
a2f01e5259
added getter and setter
2020-07-20 18:54:17 +02:00
Miriam Baglioni
40bbe94f7c
merge with master fork
2020-07-20 18:10:03 +02:00
Miriam Baglioni
23160b4d29
realignment of the workflow classes with the changes in the structure of the module
2020-07-20 18:04:30 +02:00
Miriam Baglioni
3aab7680f6
changed the test results
2020-07-20 18:00:43 +02:00
Miriam Baglioni
5076e4f320
changed test to comply with the modifications
2020-07-20 17:55:18 +02:00
Miriam Baglioni
08dbd99455
changed to dump the whole results graph by usign classes already implemented for communities. Added class to dump also organization
2020-07-20 17:54:28 +02:00
Miriam Baglioni
e47ea9349c
extended some types by adding provenance as the couple (provenance, trust) and moved some classes to be used by the complete graph dump also
2020-07-20 17:46:27 +02:00
Claudio Atzori
32f5e466e3
imports cleanup
2020-07-20 17:42:58 +02:00
Claudio Atzori
54ac583923
code formatting
2020-07-20 17:37:08 +02:00
Claudio Atzori
124e7ce19c
in case of missing attribute //dr:CobjCategory/@type the resulttype is derived by looking up the vocabulary dnet:result_typologies with the 1st instance type available
2020-07-20 17:33:37 +02:00
Claudio Atzori
050dda223d
Merge pull request 'removed duplicated fields' ( #25 ) from unique_field_in_lists into master
...
Looks good as a temporary workaround. I agree the model could seamlessly make the distinct operation by using HashSets instead of Linked (or Array) Lists.
The task to update the model in such a way is added on #9#issuecomment-1583
Thanks!
2020-07-20 12:12:50 +02:00
Claudio Atzori
e0c4cf6f7b
added parameter to drive the graph merge strategy: priority (BETA|PROD)
2020-07-20 10:48:01 +02:00
Claudio Atzori
94ccdb4852
Merge branch 'master' into merge_graph
2020-07-20 10:14:55 +02:00
Michele Artini
331a3cbdd0
fixed originalId
2020-07-20 09:50:29 +02:00
Sandro La Bruzzo
9116d75b3e
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-17 18:01:30 +02:00
Miriam Baglioni
47c7122773
changed priority from beta to production
2020-07-17 12:56:35 +02:00
Michele Artini
442f30930c
removed duplicated fields
2020-07-17 12:25:36 +02:00
Michele Artini
3adedd0a68
trust truncated to 3 decimals
2020-07-17 11:58:11 +02:00
Claudio Atzori
1781609508
code formatting
2020-07-16 19:06:56 +02:00
Claudio Atzori
878f2b931c
Merge branch 'master' into merge_graph
2020-07-16 16:34:24 +02:00
Miriam Baglioni
f9ad6f3255
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
2020-07-10 19:42:53 +02:00
Miriam Baglioni
c27f12d6e8
avoid to consider _SUCCESS file
2020-07-10 19:42:23 +02:00
Claudio Atzori
31071e363f
Merge branch 'provision_indexing'
2020-07-10 19:03:57 +02:00
Claudio Atzori
cc77446dc4
added dbSchema parameter to the raw_db workflow
2020-07-10 19:01:50 +02:00
Michele Artini
e1ae964bc4
stats
2020-07-10 16:12:08 +02:00
Sandro La Bruzzo
c01efed79b
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-10 14:44:57 +02:00
Sandro La Bruzzo
a7d3977481
added generation of EBI Dataset
2020-07-10 14:44:50 +02:00
Claudio Atzori
67e1d222b6
bulk cleaning when found null or empty, sets bestaccessrights evaluating the result instances
2020-07-08 17:53:35 +02:00
Claudio Atzori
610d377d57
first implementation of the BETA & PROD graphs merge procedure
2020-07-08 16:54:26 +02:00
Alessia Bardi
9a898c0e4c
Json schema generator
2020-07-08 12:52:00 +02:00
Alessia Bardi
636f9ce7d6
json schema generator lib
2020-07-08 12:50:57 +02:00