Claudio Atzori
|
a900bfb874
|
delegating the date parsing to https://github.com/sisyphsu/dateparser
|
2021-06-11 16:53:01 +02:00 |
Sandro La Bruzzo
|
e57294ac99
|
implemented changes on PUBMed dataflow
|
2021-06-03 10:52:09 +02:00 |
Michele Artini
|
f0fbfdcfae
|
Merge branch 'stable_ids' into import_new_mdstores
|
2021-06-01 12:03:00 +02:00 |
Michele Artini
|
03a510859a
|
removed coalesce(1)
|
2021-05-31 14:10:51 +02:00 |
Michele Artini
|
e9f2b6037c
|
patch of mdstore records
|
2021-05-31 11:36:26 +02:00 |
Claudio Atzori
|
6e3a4e9237
|
updated test expectations
|
2021-05-28 09:37:50 +02:00 |
Claudio Atzori
|
9d725efdc1
|
reverted implementation of the mdstore client
|
2021-05-20 18:26:09 +02:00 |
Claudio Atzori
|
ae5c28e54f
|
code formatting
|
2021-05-20 16:13:06 +02:00 |
Claudio Atzori
|
232dce83db
|
fixes #6701: xpath for titles to support both datacite and Guidelines v4 mapping
|
2021-05-20 14:41:15 +02:00 |
Claudio Atzori
|
23b8883ab1
|
applied intellij code cleanup
|
2021-05-14 10:58:12 +02:00 |
Claudio Atzori
|
d1cbee8413
|
imported methods from CleaningFunctions, defined in GraphCleaningFunctions
|
2021-05-10 16:43:39 +02:00 |
Claudio Atzori
|
d4a30fabe3
|
clean up tests
|
2021-05-05 17:28:15 +02:00 |
Claudio Atzori
|
dccaf173cf
|
fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials
|
2021-05-05 16:36:15 +02:00 |
Claudio Atzori
|
2e1eb96f9a
|
code formatting
|
2021-05-05 11:23:57 +02:00 |
Claudio Atzori
|
fb930b84d3
|
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
|
2021-05-04 18:06:30 +02:00 |
Claudio Atzori
|
923d19ea8e
|
mdstore read lock/unlock when bulk copying records from mongodb to hdfs
|
2021-05-04 18:06:21 +02:00 |
Sandro La Bruzzo
|
714b71bd21
|
updated pubmed
|
2021-05-04 14:54:12 +02:00 |
Sandro La Bruzzo
|
2129e9caa7
|
updated pangaea transformation to parse directly the xml
|
2021-04-28 10:21:03 +02:00 |
Claudio Atzori
|
5afa7d3e0c
|
core utilities in dhp-common moved in external module dhp-schemas
|
2021-04-27 15:44:01 +02:00 |
Sandro La Bruzzo
|
7f8848ecdd
|
added first implementation of Pangaea Mapping
|
2021-04-27 11:30:37 +02:00 |
Claudio Atzori
|
d0d477cca3
|
code formatting
|
2021-04-20 12:50:34 +02:00 |
miconis
|
0393cdce42
|
addition of alternative names in export queries
|
2021-04-20 12:45:21 +02:00 |
Claudio Atzori
|
d1ca025b0b
|
[cleaning] remiving authors without fullname or providing 'deactivated' keyword. Removing test test titles
|
2021-04-13 14:32:41 +02:00 |
Claudio Atzori
|
827e7e37db
|
[Cleaning] drop instance.alternateIdentifier elements when they are available among instance.pid
|
2021-03-25 11:07:59 +01:00 |
Claudio Atzori
|
751125fdf9
|
[Actionmanager] zero function considers empty entity.id as well as rel.source/rel.target
|
2021-03-23 17:34:32 +01:00 |
Claudio Atzori
|
b4febed138
|
updated mapping tests as consequence of the special treatment reserved to Handle PIDs
|
2021-03-23 09:37:48 +01:00 |
Claudio Atzori
|
431cbe9955
|
handle missing instance.pid during bulk cleaning
|
2021-03-23 09:28:58 +01:00 |
Sandro La Bruzzo
|
c73072079d
|
fix conflicts
|
2021-03-22 16:36:31 +01:00 |
Claudio Atzori
|
8257f9a2bc
|
result.pid: adjusted the mapping applied to the contents from the aggregator
|
2021-03-17 12:45:38 +01:00 |
Claudio Atzori
|
640b885706
|
added instance.alternativeIdentifiers to the graph model, adjusted the mapping applied to the contents from the aggregator
|
2021-03-16 14:19:32 +01:00 |
Claudio Atzori
|
01630f638d
|
IdentifierFactory implementation based on the list of datasources authoritative for a given pid type
|
2021-03-09 17:11:50 +01:00 |
Claudio Atzori
|
59532b0919
|
[#6281 Provenance of product PIDs] Added PIDs to the Instance type; extended mapping for OAF/ODF records
|
2021-03-09 11:14:45 +01:00 |
Claudio Atzori
|
f468c7f0d7
|
merged from master
|
2021-03-09 09:12:41 +01:00 |
Claudio Atzori
|
8d2bb24512
|
merged from master
|
2021-03-08 15:44:34 +01:00 |
Alessia Bardi
|
c4d1feca74
|
mapper test with validated link to project
|
2021-02-10 11:22:54 +01:00 |
Alessia Bardi
|
c67329d3ad
|
updated test for EU Open Data portal datasets
|
2021-02-03 17:06:48 +01:00 |
Alessia Bardi
|
fd705404a1
|
tests for EU Open Data portal dataset mapping
|
2021-02-03 10:28:17 +01:00 |
Sandro La Bruzzo
|
686e7b507c
|
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into aggregation_on_hadoop
|
2021-01-28 10:02:13 +01:00 |
Sandro La Bruzzo
|
98b9498b57
|
Removed old messaging system not quite used from collection and Transformation workflow
code refactor
|
2021-01-28 09:51:17 +01:00 |
Sandro La Bruzzo
|
150a617bd1
|
Merge pull request 'aggregation_on_hadoop' (#90) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
Wonderfull code... You're the Best Sandro
|
2021-01-26 16:00:47 +01:00 |
Alessia Bardi
|
505477f36f
|
format code
|
2021-01-25 18:02:49 +01:00 |
Alessia Bardi
|
ded6ed8d7d
|
no ',' author, if there are no author in ODF records
|
2021-01-25 17:57:51 +01:00 |
Sandro La Bruzzo
|
a54848a59c
|
Moved Vocabulary stuff to common module
|
2021-01-25 15:43:04 +01:00 |
Claudio Atzori
|
47270d9af5
|
lenient mock can be lenient
|
2020-12-18 15:38:59 +01:00 |
Alessia Bardi
|
f9a8fd8bbd
|
updated test record for textgrid
|
2020-12-17 11:59:45 +01:00 |
Claudio Atzori
|
12e2f930c8
|
resolved conflicts
|
2020-12-10 10:57:39 +01:00 |
Alessia Bardi
|
112da6d76a
|
in theory, just auto-formatting after mvn compile
|
2020-12-09 20:00:27 +01:00 |
Alessia Bardi
|
bece04b330
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-12-09 19:54:43 +01:00 |
Alessia Bardi
|
426b76ee8e
|
more asserts for TextGrid record
|
2020-12-09 19:46:11 +01:00 |
Claudio Atzori
|
4705144918
|
Merge pull request 'rel_project_validation' (#69) from rel_project_validation into master
LGTM
|
2020-12-09 19:01:20 +01:00 |
Michele Artini
|
370a5e650b
|
validation attributes in resultProject relations
|
2020-12-09 11:18:26 +01:00 |
Miriam Baglioni
|
5fb65ffc4a
|
merge branch with master
|
2020-12-03 11:24:35 +01:00 |
Claudio Atzori
|
57f448b7a4
|
graph cleaning workflow separate orcid_pending from orcid, depending on the author pid provenance
|
2020-12-02 10:44:05 +01:00 |
Alessia Bardi
|
a417624670
|
tests for raw graph mapping
|
2020-12-02 10:15:26 +01:00 |
Claudio Atzori
|
2c407e775e
|
GenerateEntitiesApplication can be configured to hash the id value or not
|
2020-11-30 12:00:38 +01:00 |
Miriam Baglioni
|
124591a7f3
|
refactoring
|
2020-11-25 18:23:28 +01:00 |
Miriam Baglioni
|
1a89f8211c
|
#61 (comment)
|
2020-11-25 18:12:40 +01:00 |
Miriam Baglioni
|
d4ddde2ef2
|
changed because of #61 (comment)
|
2020-11-25 18:01:01 +01:00 |
Miriam Baglioni
|
90d4369fd2
|
added test to verify the compression in writing community info on hdfs
|
2020-11-25 14:34:58 +01:00 |
Miriam Baglioni
|
1f130cdf92
|
changed the relation (produces -> isProducedBy) due to the change in the code
|
2020-11-25 14:04:26 +01:00 |
Miriam Baglioni
|
305e3d0c9c
|
added resource file for relation with relClass = isProducedBy
|
2020-11-25 13:43:41 +01:00 |
Miriam Baglioni
|
bde6d337dd
|
test classes for dump of results related to funders
|
2020-11-25 13:42:01 +01:00 |
Miriam Baglioni
|
b37b9352d7
|
added constant value for semantic relationship between projects and results
|
2020-11-25 13:41:08 +01:00 |
Claudio Atzori
|
e1a1bb3ee4
|
moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects
|
2020-11-24 18:34:03 +01:00 |
Miriam Baglioni
|
54a309bb6b
|
refactoring
|
2020-11-24 14:45:30 +01:00 |
Miriam Baglioni
|
35ecea8842
|
changed to consider the modification for the specification of the type of dump
|
2020-11-24 14:45:15 +01:00 |
Claudio Atzori
|
3f34757c63
|
merged from master
|
2020-11-19 14:34:54 +01:00 |
Claudio Atzori
|
ede7fae6c8
|
Merge pull request 'XML record indexing test' (#58) from provision_indexing into master
|
2020-11-18 17:04:34 +01:00 |
Claudio Atzori
|
8177ce7939
|
test for XmlIndexingJob based on a local miniSolrCluster
|
2020-11-18 10:58:05 +01:00 |
Michele Artini
|
33da2e3d6c
|
xpaths for dateOfCollection and dateOfTransformation
|
2020-11-18 09:26:20 +01:00 |
Alessia Bardi
|
7e0a76a8ac
|
test fr TextGrid
|
2020-11-17 18:39:25 +01:00 |
Claudio Atzori
|
331d621800
|
added test resource
|
2020-11-14 12:16:15 +01:00 |
Claudio Atzori
|
768bc5304c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-13 15:40:34 +01:00 |
Claudio Atzori
|
93f7b7974f
|
Merge pull request 'trust truncated to 3 decimals' (#24) from trunc_trust into master
LGTM
|
2020-11-13 15:40:02 +01:00 |
Claudio Atzori
|
2bed29eb09
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:12 +01:00 |
Claudio Atzori
|
13e36a4da0
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:02 +01:00 |
Claudio Atzori
|
9b0fb9e958
|
merged from master
|
2020-11-12 09:27:12 +01:00 |
Sandro La Bruzzo
|
027ef2326c
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-11-06 17:12:42 +01:00 |
Sandro La Bruzzo
|
cd27df91a1
|
fixed bug on missing relation in ANDS
|
2020-11-06 17:12:31 +01:00 |
Claudio Atzori
|
d10447e747
|
re-packaged graph dump workflow sources
|
2020-11-05 17:38:18 +01:00 |
Miriam Baglioni
|
56150d7e5e
|
removed all code related to the dump of pids graph
|
2020-11-04 17:13:12 +01:00 |
Miriam Baglioni
|
c010a8442f
|
fixed issue on test code
|
2020-11-03 17:26:51 +01:00 |
Miriam Baglioni
|
8ec7a61188
|
merge branch with master
|
2020-11-03 16:59:08 +01:00 |
Claudio Atzori
|
86d6fbe95b
|
refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong
|
2020-11-03 12:19:46 +01:00 |
Claudio Atzori
|
3fcd669e99
|
result merge operation leverage on custom ResultTypeComparator in the aggregator graph construction
|
2020-11-03 10:53:23 +01:00 |
Claudio Atzori
|
09e44dabff
|
Merge branch 'master' into stable_ids
|
2020-11-02 12:16:01 +01:00 |
Sandro La Bruzzo
|
754c86f33e
|
fixed test to work on jenkins
|
2020-11-02 09:35:01 +01:00 |
Claudio Atzori
|
4ca75d6951
|
Merge pull request 'Dedup ID creation policy' (#48) from deduptesting into stable_ids
|
2020-10-30 15:15:32 +01:00 |
Miriam Baglioni
|
3241ec1777
|
added connection timeout and socket timeout 600 sec
|
2020-10-27 16:12:11 +01:00 |
Alessia Bardi
|
1425d810a8
|
testing mapping
|
2020-10-19 17:46:14 +02:00 |
Claudio Atzori
|
266bf1a221
|
common IdentifierFactory in use on the mapping from the aggregator data; merge the entities sharing the same id; code formatting
|
2020-10-16 17:02:10 +02:00 |
Claudio Atzori
|
34f1d0904b
|
common IdentifierFactory in use on the mapping from the aggregator data
|
2020-10-16 16:00:19 +02:00 |
Sandro La Bruzzo
|
fed711da80
|
Merge remote-tracking branch 'origin/master' into merge_record_to_common
|
2020-10-13 15:32:45 +02:00 |
Alessia Bardi
|
8775a64bc1
|
Merge pull request 'Merging different compatibility levels (pinocchio operator)' (#47) from merge_graph into master
|
2020-10-09 14:44:52 +02:00 |
Sandro La Bruzzo
|
fe0a7870e6
|
Added test to check if merge authors works
|
2020-10-08 10:33:12 +02:00 |
Sandro La Bruzzo
|
cd9c377d18
|
adpted scholexplorer Dump generation to the new Dataset definition
|
2020-10-08 10:10:13 +02:00 |
Claudio Atzori
|
8d85a2fced
|
[BETA wf only] datasources involved in the merge operation doesn't obey to the infra precedence policy, but relies on a custom behaviour that, given two datasources from beta and prod returns the one from prod with the highest compatibility among the two
|
2020-10-07 16:28:52 +02:00 |
Claudio Atzori
|
c2a6e2a9bf
|
fixed mapping for datasource journal info (ISSNs)
|
2020-10-02 09:37:08 +02:00 |
Miriam Baglioni
|
fcaedac980
|
merge branch with master
|
2020-10-01 16:46:59 +02:00 |
Claudio Atzori
|
2e9e13444d
|
author pids made unique by value
|
2020-10-01 12:50:40 +02:00 |
Claudio Atzori
|
e265c3e125
|
cleaning functions factored out in a dedicated class
|
2020-10-01 10:50:15 +02:00 |
Miriam Baglioni
|
de6c4d46d8
|
fixed conflicts
|
2020-09-24 15:35:01 +02:00 |
Claudio Atzori
|
27df1cea6d
|
code formatting
|
2020-09-24 12:16:00 +02:00 |
Claudio Atzori
|
fb22f4d70b
|
included values for projects fundedamount and totalcost fields in the mapping tests. Swapped expected and actual values in junit test assertions
|
2020-09-24 12:10:59 +02:00 |
Claudio Atzori
|
9e3e93c6b6
|
setting the correct issn type in the datasource.journal element
|
2020-09-24 10:39:16 +02:00 |
Miriam Baglioni
|
c2b5c780ff
|
-
|
2020-09-14 14:34:03 +02:00 |
Miriam Baglioni
|
1f893e63dc
|
-
|
2020-09-14 14:33:10 +02:00 |
Claudio Atzori
|
8a523474b7
|
code formatting
|
2020-09-07 11:40:16 +02:00 |
Miriam Baglioni
|
b72a7dad46
|
resuorce for pid graph dump
|
2020-08-24 17:09:01 +02:00 |
Miriam Baglioni
|
da103c399a
|
resources for the pid graph dump test
|
2020-08-24 16:52:07 +02:00 |
Miriam Baglioni
|
630a6a1fe7
|
first tests for the pid graph dump
|
2020-08-24 16:51:26 +02:00 |
Miriam Baglioni
|
2c783793ba
|
removed the affiliation from the author to mirror the changes in the model
|
2020-08-19 11:48:12 +02:00 |
Miriam Baglioni
|
f6bf888016
|
removed affiliation from author to mirror the changes in the model
|
2020-08-19 11:41:41 +02:00 |
Miriam Baglioni
|
66d0e0d3f2
|
-
|
2020-08-19 11:31:50 +02:00 |
Miriam Baglioni
|
d407852ac2
|
changed to reflect the changed in the model
|
2020-08-19 11:15:05 +02:00 |
Miriam Baglioni
|
47c21a8961
|
refactoring due to compilation
|
2020-08-19 11:11:57 +02:00 |
Miriam Baglioni
|
96600ed04a
|
modified test resource for mirroring the deletion of affiliation from author parameters
|
2020-08-14 20:41:49 +02:00 |
Miriam Baglioni
|
d2a8a4961a
|
refactoring
|
2020-08-13 18:50:33 +02:00 |
Miriam Baglioni
|
fd48ae3b85
|
changed because of #40 (comment)
|
2020-08-13 12:19:15 +02:00 |
Miriam Baglioni
|
04a3e1ab38
|
disabled tests
|
2020-08-13 12:18:13 +02:00 |
Miriam Baglioni
|
2ede397933
|
Apply change because of #40 (comment)
|
2020-08-13 12:16:39 +02:00 |
Miriam Baglioni
|
adf9f96a67
|
test for extraction of relation between organizations and context
|
2020-08-12 10:04:47 +02:00 |
Miriam Baglioni
|
25f4fbceea
|
draft of test and resources
|
2020-08-11 17:37:22 +02:00 |
Miriam Baglioni
|
30a2b19b65
|
changed metadata for deposition od covid-19 dump in Zenodo
|
2020-08-11 17:36:56 +02:00 |
Miriam Baglioni
|
49788b532a
|
changed to mirror changes in the schema
|
2020-08-11 16:05:03 +02:00 |
Miriam Baglioni
|
b08511287b
|
-
|
2020-08-11 16:01:36 +02:00 |
Miriam Baglioni
|
7e81a17068
|
changed the XQUERY to mirror the change in the code
|
2020-08-11 16:00:33 +02:00 |
Miriam Baglioni
|
37ad2f28e9
|
removed added | in prefix for datasource
|
2020-08-11 15:55:06 +02:00 |
Miriam Baglioni
|
f31c2e9461
|
enabled test
|
2020-08-11 15:49:25 +02:00 |
Miriam Baglioni
|
2d67476417
|
merge branch with master
|
2020-08-11 15:46:04 +02:00 |
Miriam Baglioni
|
6d3804e24c
|
-
|
2020-08-11 15:45:12 +02:00 |
Miriam Baglioni
|
0603ec4757
|
changed test to upload the dump for covid-19 community
|
2020-08-11 15:43:25 +02:00 |
Miriam Baglioni
|
7dfd56df9d
|
-
|
2020-08-11 15:42:35 +02:00 |
Miriam Baglioni
|
a169d7e7c1
|
added test file for the MakeTar class
|
2020-08-11 15:40:41 +02:00 |
Miriam Baglioni
|
c378c38546
|
disabled test. The testing functionalities for hte upload in Zenode are moved to common
|
2020-08-10 12:41:11 +02:00 |
Miriam Baglioni
|
63ad0ed209
|
changed to use communityMapPath instead of IsLookUp
|
2020-08-10 12:40:19 +02:00 |
Miriam Baglioni
|
cec795f2ea
|
changed resources to mirror changes in the model
|
2020-08-10 12:39:35 +02:00 |
Miriam Baglioni
|
f50e3e7333
|
changed the class for which to generate the schema
|
2020-08-10 12:03:49 +02:00 |
Miriam Baglioni
|
b8c26f656c
|
test using communityMapPath instead of isLookUp
|
2020-08-10 12:02:55 +02:00 |
Sandro La Bruzzo
|
0ade33ad15
|
updated mergeFrom function for DLI Unknown
|
2020-08-10 10:18:35 +02:00 |
Miriam Baglioni
|
545ea9f77e
|
moved in common. Zenodo response model and APIClient to deposit in Zenodo
|
2020-08-07 16:44:51 +02:00 |
Miriam Baglioni
|
adf0ca5aa7
|
test to send is from hdfs
|
2020-08-05 14:24:43 +02:00 |
Alessia Bardi
|
a29565ff57
|
code formatting
|
2020-08-04 12:55:27 +02:00 |
Alessia Bardi
|
09a323d18d
|
testing a dataset from Nakala
|
2020-08-04 12:50:52 +02:00 |
Alessia Bardi
|
c35bf486cc
|
added handle among the possible PIDs
|
2020-08-04 12:50:12 +02:00 |
Miriam Baglioni
|
5b651abf82
|
merge branch with master
|
2020-08-04 10:14:07 +02:00 |
Miriam Baglioni
|
fa38cdb10b
|
added resource
|
2020-08-03 18:11:12 +02:00 |
Miriam Baglioni
|
e9fcc0b2f1
|
commented test unit - to decide change for mirroring the changed logics
|
2020-08-03 18:10:53 +02:00 |
Miriam Baglioni
|
c892c7dfa7
|
changed to query for community map just once and save the result for remaining executions
|
2020-08-03 17:56:31 +02:00 |
Alessia Bardi
|
8cc067fe76
|
specific test for claims
|
2020-08-03 11:17:50 +02:00 |