Claudio Atzori
|
431cbe9955
|
handle missing instance.pid during bulk cleaning
|
2021-03-23 09:28:58 +01:00 |
Sandro La Bruzzo
|
c73072079d
|
fix conflicts
|
2021-03-22 16:36:31 +01:00 |
Claudio Atzori
|
5a043e95ea
|
code formatting
|
2021-03-19 11:37:27 +01:00 |
Claudio Atzori
|
a4e82a65aa
|
integrated filter applied when merging BETA & PROD graphs to rule our records from Datacite
|
2021-03-19 11:34:44 +01:00 |
Claudio Atzori
|
8257f9a2bc
|
result.pid: adjusted the mapping applied to the contents from the aggregator
|
2021-03-17 12:45:38 +01:00 |
Claudio Atzori
|
640b885706
|
added instance.alternativeIdentifiers to the graph model, adjusted the mapping applied to the contents from the aggregator
|
2021-03-16 14:19:32 +01:00 |
Claudio Atzori
|
01630f638d
|
IdentifierFactory implementation based on the list of datasources authoritative for a given pid type
|
2021-03-09 17:11:50 +01:00 |
Claudio Atzori
|
59532b0919
|
[#6281 Provenance of product PIDs] Added PIDs to the Instance type; extended mapping for OAF/ODF records
|
2021-03-09 11:14:45 +01:00 |
Claudio Atzori
|
d525785497
|
[#6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color.
|
2021-03-09 11:12:55 +01:00 |
Claudio Atzori
|
f468c7f0d7
|
merged from master
|
2021-03-09 09:12:41 +01:00 |
Claudio Atzori
|
8d2bb24512
|
merged from master
|
2021-03-08 15:44:34 +01:00 |
Claudio Atzori
|
fa7930d2e2
|
merging contributions from PR#97
|
2021-03-05 15:45:28 +01:00 |
miconis
|
1a85020572
|
bug fix in graph-mapper, changes in the implementation of the openorgs wf to create relations and populate openorgs db
|
2021-02-26 10:19:28 +01:00 |
Claudio Atzori
|
b830e33392
|
mdstore collector plugin
|
2021-02-25 12:30:30 +01:00 |
Claudio Atzori
|
fc3fa5e343
|
implemented mdstore collector plugin
|
2021-02-24 15:07:24 +01:00 |
miconis
|
4b2124a18e
|
implementation of the openorgs wfs, implementation of the raw_all wf to migrate openorgs db entities
|
2021-02-10 11:51:50 +01:00 |
Alessia Bardi
|
c4d1feca74
|
mapper test with validated link to project
|
2021-02-10 11:22:54 +01:00 |
Claudio Atzori
|
72c57b28fa
|
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
|
2021-02-04 14:08:18 +01:00 |
Alessia Bardi
|
c67329d3ad
|
updated test for EU Open Data portal datasets
|
2021-02-03 17:06:48 +01:00 |
Alessia Bardi
|
fd705404a1
|
tests for EU Open Data portal dataset mapping
|
2021-02-03 10:28:17 +01:00 |
Sandro La Bruzzo
|
686e7b507c
|
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into aggregation_on_hadoop
|
2021-01-28 10:02:13 +01:00 |
Sandro La Bruzzo
|
98b9498b57
|
Removed old messaging system not quite used from collection and Transformation workflow
code refactor
|
2021-01-28 09:51:17 +01:00 |
Sandro La Bruzzo
|
150a617bd1
|
Merge pull request 'aggregation_on_hadoop' (#90) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
Wonderfull code... You're the Best Sandro
|
2021-01-26 16:00:47 +01:00 |
Claudio Atzori
|
885e0dd926
|
[Cleaning] filter authors not providing word characters in the fullname
|
2021-01-26 09:48:53 +01:00 |
Claudio Atzori
|
2890511613
|
[Cleaning] normalise missing Result.country
|
2021-01-26 09:41:44 +01:00 |
Claudio Atzori
|
4eb9ed35b1
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2021-01-25 18:12:24 +01:00 |
Claudio Atzori
|
cd379eb5e3
|
[Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname
|
2021-01-25 18:11:49 +01:00 |
Alessia Bardi
|
505477f36f
|
format code
|
2021-01-25 18:02:49 +01:00 |
Alessia Bardi
|
ded6ed8d7d
|
no ',' author, if there are no author in ODF records
|
2021-01-25 17:57:51 +01:00 |
Claudio Atzori
|
3465c8ccee
|
[Cleaning] trying to avoid NPEs
|
2021-01-25 16:54:53 +01:00 |
Sandro La Bruzzo
|
a54848a59c
|
Moved Vocabulary stuff to common module
|
2021-01-25 15:43:04 +01:00 |
Claudio Atzori
|
07a0ccfc96
|
[Cleaning] trying to avoid NPEs
|
2021-01-25 13:36:01 +01:00 |
Claudio Atzori
|
34d653de41
|
[Cleaning] updated cleaning rule for DOIs
|
2021-01-22 14:16:33 +01:00 |
Claudio Atzori
|
26e9d55c13
|
code formatting
|
2021-01-05 09:59:26 +01:00 |
Claudio Atzori
|
7185158942
|
ignore missing properties
|
2020-12-29 11:06:28 +01:00 |
Claudio Atzori
|
28460c2cd1
|
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
|
2020-12-23 16:59:52 +01:00 |
Claudio Atzori
|
723b01f9e9
|
trivial: the less magic numbers and values around, the better
|
2020-12-23 12:22:48 +01:00 |
Claudio Atzori
|
6cb0dc3f43
|
extended OCRID cleaning procedure
|
2020-12-21 11:40:17 +01:00 |
Claudio Atzori
|
47270d9af5
|
lenient mock can be lenient
|
2020-12-18 15:38:59 +01:00 |
Alessia Bardi
|
f9a8fd8bbd
|
updated test record for textgrid
|
2020-12-17 11:59:45 +01:00 |
Michele Artini
|
991e675dc6
|
validation in claim rels
|
2020-12-14 15:41:25 +01:00 |
Claudio Atzori
|
12e2f930c8
|
resolved conflicts
|
2020-12-10 10:57:39 +01:00 |
Alessia Bardi
|
112da6d76a
|
in theory, just auto-formatting after mvn compile
|
2020-12-09 20:00:27 +01:00 |
Alessia Bardi
|
bece04b330
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-12-09 19:54:43 +01:00 |
Alessia Bardi
|
426b76ee8e
|
more asserts for TextGrid record
|
2020-12-09 19:46:11 +01:00 |
Claudio Atzori
|
4705144918
|
Merge pull request 'rel_project_validation' (#69) from rel_project_validation into master
LGTM
|
2020-12-09 19:01:20 +01:00 |
Claudio Atzori
|
ada21ad920
|
Merge pull request 'dump of the results related to at least one project' (#61) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
|
2020-12-09 17:22:56 +01:00 |
Michele Artini
|
1bc9adc10d
|
default trust for validated rels
|
2020-12-09 16:18:37 +01:00 |
Michele Artini
|
5f21a356fd
|
reindent
|
2020-12-09 11:24:30 +01:00 |
Michele Artini
|
370a5e650b
|
validation attributes in resultProject relations
|
2020-12-09 11:18:26 +01:00 |
Claudio Atzori
|
a104a632df
|
cleanup
|
2020-12-04 16:32:47 +01:00 |
Miriam Baglioni
|
5fb65ffc4a
|
merge branch with master
|
2020-12-03 11:24:35 +01:00 |
Miriam Baglioni
|
ea88dc3401
|
fixed issue in property name
|
2020-12-03 11:24:23 +01:00 |
Claudio Atzori
|
cfb55effd9
|
code formatting
|
2020-12-02 11:23:49 +01:00 |
Claudio Atzori
|
57f448b7a4
|
graph cleaning workflow separate orcid_pending from orcid, depending on the author pid provenance
|
2020-12-02 10:44:05 +01:00 |
Alessia Bardi
|
a417624670
|
tests for raw graph mapping
|
2020-12-02 10:15:26 +01:00 |
Claudio Atzori
|
893ac4a77b
|
GenerateEntitiesApplication can be configured to hash the id value or not
|
2020-12-02 09:30:06 +01:00 |
Claudio Atzori
|
2c407e775e
|
GenerateEntitiesApplication can be configured to hash the id value or not
|
2020-11-30 12:00:38 +01:00 |
Claudio Atzori
|
e731a7658d
|
cleaning texts to remove tab characters too
|
2020-11-27 09:00:04 +01:00 |
Claudio Atzori
|
c1b9a4045a
|
grouping of records will be performed by the dedup workflow
|
2020-11-26 10:59:10 +01:00 |
Miriam Baglioni
|
124591a7f3
|
refactoring
|
2020-11-25 18:23:28 +01:00 |
Miriam Baglioni
|
1a89f8211c
|
D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:12:40 +01:00 |
Miriam Baglioni
|
5fbe54ef54
|
D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:10:28 +01:00 |
Miriam Baglioni
|
ed01e5a5e1
|
D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:09:34 +01:00 |
Miriam Baglioni
|
d4ddde2ef2
|
changed because of D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:01:01 +01:00 |
Miriam Baglioni
|
f5e5e92a10
|
changed because of D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 17:58:53 +01:00 |
Miriam Baglioni
|
1df94b85b4
|
changed because of D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 17:57:43 +01:00 |
Claudio Atzori
|
dfd6205b95
|
Consistency graph workflow merges all the entities by ID
|
2020-11-25 14:55:32 +01:00 |
Miriam Baglioni
|
90d4369fd2
|
added test to verify the compression in writing community info on hdfs
|
2020-11-25 14:34:58 +01:00 |
Miriam Baglioni
|
6750e33d69
|
merge branch with master
|
2020-11-25 14:09:01 +01:00 |
Miriam Baglioni
|
b2c455f883
|
added java doc
|
2020-11-25 14:08:09 +01:00 |
Miriam Baglioni
|
1f130cdf92
|
changed the relation (produces -> isProducedBy) due to the change in the code
|
2020-11-25 14:04:26 +01:00 |
Miriam Baglioni
|
e758d5d9b4
|
refactoring
|
2020-11-25 13:46:39 +01:00 |
Miriam Baglioni
|
87a9f616ae
|
refactoring and addition of the funder nsp first part as nome for the dump insteasd of the whole nsp
|
2020-11-25 13:45:41 +01:00 |
Miriam Baglioni
|
e7e418e444
|
added decision node to verify if to upload in Zenodo
|
2020-11-25 13:44:10 +01:00 |
Miriam Baglioni
|
305e3d0c9c
|
added resource file for relation with relClass = isProducedBy
|
2020-11-25 13:43:41 +01:00 |
Miriam Baglioni
|
21ce175d17
|
added FilterFunction specification if filter operation
|
2020-11-25 13:42:31 +01:00 |
Miriam Baglioni
|
bde6d337dd
|
test classes for dump of results related to funders
|
2020-11-25 13:42:01 +01:00 |
Miriam Baglioni
|
b37b9352d7
|
added constant value for semantic relationship between projects and results
|
2020-11-25 13:41:08 +01:00 |
Claudio Atzori
|
36173c13a5
|
reverted filters in the clening process
|
2020-11-25 10:24:42 +01:00 |
Claudio Atzori
|
eeebd5a920
|
Cleanig workflow: remove newlines from titles, descriptions, subjects
|
2020-11-24 18:40:25 +01:00 |
Claudio Atzori
|
e1a1bb3ee4
|
moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects
|
2020-11-24 18:34:03 +01:00 |
Miriam Baglioni
|
72bb0fe360
|
changed directory name
|
2020-11-24 16:47:07 +01:00 |
Miriam Baglioni
|
39f4a20873
|
chenged the path and the name for saving the communities_infrastructures dump file
|
2020-11-24 14:47:32 +01:00 |
Miriam Baglioni
|
7e14452a87
|
final versione of the wf to get the dump of results associated to at least one funder per funder
|
2020-11-24 14:46:34 +01:00 |
Miriam Baglioni
|
c167a18057
|
added new parameter for the dumpType
|
2020-11-24 14:45:50 +01:00 |
Miriam Baglioni
|
54a309bb6b
|
refactoring
|
2020-11-24 14:45:30 +01:00 |
Miriam Baglioni
|
35ecea8842
|
changed to consider the modification for the specification of the type of dump
|
2020-11-24 14:45:15 +01:00 |
Miriam Baglioni
|
b9b6bdb2e6
|
fixing issue on previous implementation
|
2020-11-24 14:44:53 +01:00 |
Miriam Baglioni
|
7e940f1991
|
changed to consider the modification for the specification of the type of dump
|
2020-11-24 14:43:34 +01:00 |
Miriam Baglioni
|
62928ef7a5
|
changed to save the communities_infrastructures information as the other entity dumps: in a json.gz file
|
2020-11-24 14:42:41 +01:00 |
Claudio Atzori
|
33bae02451
|
reverted behaviour of the cleaning workflow: grouping entities by ID will be managed differently
|
2020-11-24 14:42:33 +01:00 |
Miriam Baglioni
|
3319440c53
|
changed the direction of the relation between projects and result considered to select the results linked to projects
|
2020-11-24 14:41:09 +01:00 |
Miriam Baglioni
|
00c377dac2
|
added specification of MapFunction types in map
|
2020-11-24 14:40:22 +01:00 |
Miriam Baglioni
|
44db258dc4
|
added enumerated for the dump type
|
2020-11-24 14:38:06 +01:00 |
Miriam Baglioni
|
1832708c42
|
modified boolean variable with string one whcih specify the type of dump we are performing: complete, community or funder
|
2020-11-24 14:37:36 +01:00 |
Miriam Baglioni
|
259c67ce36
|
fixed issue in path name
|
2020-11-20 12:32:23 +01:00 |
Miriam Baglioni
|
0a9db67eec
|
-
|
2020-11-20 12:21:33 +01:00 |
Miriam Baglioni
|
d362f2637d
|
merge branch with master
|
2020-11-19 19:17:20 +01:00 |
Miriam Baglioni
|
cf3f47563f
|
new parameter files
|
2020-11-19 19:16:05 +01:00 |
Miriam Baglioni
|
24c56fa7a3
|
new logic and workflow for dump of results with link to projects. In this implementation the result match the model of the communityresult.
|
2020-11-19 19:15:39 +01:00 |
Claudio Atzori
|
fcbb05eb21
|
cleanup
|
2020-11-19 15:14:33 +01:00 |
Claudio Atzori
|
3f34757c63
|
merged from master
|
2020-11-19 14:34:54 +01:00 |
Miriam Baglioni
|
fafb688887
|
-
|
2020-11-18 18:56:48 +01:00 |
Miriam Baglioni
|
906db690d2
|
-
|
2020-11-18 17:43:08 +01:00 |
Claudio Atzori
|
ede7fae6c8
|
Merge pull request 'XML record indexing test' (#58) from provision_indexing into master
|
2020-11-18 17:04:34 +01:00 |
Miriam Baglioni
|
5402062ff5
|
changed parameter file with the ono associated to the job
|
2020-11-18 16:58:20 +01:00 |
Miriam Baglioni
|
a172a37ad1
|
fixed typo
|
2020-11-18 16:55:07 +01:00 |
Miriam Baglioni
|
46ba3793f6
|
code, workflow and parameters for the dump of the results associated to funders
|
2020-11-18 16:47:31 +01:00 |
Miriam Baglioni
|
57cac36898
|
changed the workflow name
|
2020-11-18 13:38:03 +01:00 |
Claudio Atzori
|
8177ce7939
|
test for XmlIndexingJob based on a local miniSolrCluster
|
2020-11-18 10:58:05 +01:00 |
Alessia Bardi
|
10e673660f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-18 10:01:23 +01:00 |
Alessia Bardi
|
be7b310cef
|
rel semantcis ignore case
|
2020-11-18 10:01:20 +01:00 |
Michele Artini
|
33da2e3d6c
|
xpaths for dateOfCollection and dateOfTransformation
|
2020-11-18 09:26:20 +01:00 |
Alessia Bardi
|
8f87020a50
|
#56: map relevantDates from aggregated ODF records
|
2020-11-17 18:42:09 +01:00 |
Alessia Bardi
|
7e0a76a8ac
|
test fr TextGrid
|
2020-11-17 18:39:25 +01:00 |
Claudio Atzori
|
cfc01f136e
|
PID filtering based on a blacklist
|
2020-11-17 12:27:06 +01:00 |
Claudio Atzori
|
6ab1ce53c9
|
fixed condition in result pid cleaning; cleanup
|
2020-11-16 10:09:17 +01:00 |
Claudio Atzori
|
4de8c8b237
|
fixed workflow variable name
|
2020-11-16 10:03:11 +01:00 |
Claudio Atzori
|
331d621800
|
added test resource
|
2020-11-14 12:16:15 +01:00 |
Claudio Atzori
|
5d4e34e26a
|
fixed typo in variable name
|
2020-11-14 10:32:26 +01:00 |
Claudio Atzori
|
768bc5304c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-13 15:40:34 +01:00 |
Claudio Atzori
|
93f7b7974f
|
Merge pull request 'trust truncated to 3 decimals' (#24) from trunc_trust into master
LGTM
|
2020-11-13 15:40:02 +01:00 |
Claudio Atzori
|
528231a287
|
grouping graph entities by id turned out to be an easy extension for the already existing cleaning workflow
|
2020-11-13 15:37:48 +01:00 |
Claudio Atzori
|
2bed29eb09
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:12 +01:00 |
Claudio Atzori
|
13e36a4da0
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:02 +01:00 |
Claudio Atzori
|
9b0fb9e958
|
merged from master
|
2020-11-12 09:27:12 +01:00 |
Michele Artini
|
40160d171f
|
organizations pids
|
2020-11-09 12:58:36 +01:00 |
Sandro La Bruzzo
|
027ef2326c
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-11-06 17:12:42 +01:00 |
Sandro La Bruzzo
|
cd27df91a1
|
fixed bug on missing relation in ANDS
|
2020-11-06 17:12:31 +01:00 |
Claudio Atzori
|
d10447e747
|
re-packaged graph dump workflow sources
|
2020-11-05 17:38:18 +01:00 |
Claudio Atzori
|
2d76497488
|
cleanup
|
2020-11-05 17:10:24 +01:00 |
Miriam Baglioni
|
f8e9bda24c
|
merge branch with master
|
2020-11-05 16:31:18 +01:00 |
Miriam Baglioni
|
be5ed8f554
|
added check to avoid sending empty metadata.
|
2020-11-05 16:10:17 +01:00 |
Claudio Atzori
|
2148a51fae
|
minor changes
|
2020-11-05 11:24:12 +01:00 |
Claudio Atzori
|
4625b7486e
|
code formatting
|
2020-11-04 18:12:43 +01:00 |
Miriam Baglioni
|
e9ac471ae9
|
removed dependency from classes for the pid graph dump
|
2020-11-04 18:04:42 +01:00 |
Miriam Baglioni
|
b90a945c49
|
removed property files for pid graph dump
|
2020-11-04 17:28:33 +01:00 |
Miriam Baglioni
|
bac307155a
|
removed properties specific for pid graph dump
|
2020-11-04 17:28:04 +01:00 |
Miriam Baglioni
|
9c9d50f486
|
removed code specific for pid graph dump
|
2020-11-04 17:26:22 +01:00 |
Miriam Baglioni
|
5669890934
|
removed commented lines
|
2020-11-04 17:15:21 +01:00 |
Miriam Baglioni
|
6a89f59be9
|
removed commented lines
|
2020-11-04 17:13:59 +01:00 |
Miriam Baglioni
|
56150d7e5e
|
removed all code related to the dump of pids graph
|
2020-11-04 17:13:12 +01:00 |
Miriam Baglioni
|
16c54a96f8
|
removed pid dump
|
2020-11-04 17:11:32 +01:00 |
Miriam Baglioni
|
0cac5436ff
|
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
|
2020-11-04 13:21:11 +01:00 |
Alessia Bardi
|
51808b5afd
|
Updated descriptions
|
2020-11-04 12:29:48 +01:00 |
Alessia Bardi
|
e6becf8659
|
Updated descriptions
|
2020-11-04 12:17:57 +01:00 |
Alessia Bardi
|
0abe0eee33
|
Updated descriptions
|
2020-11-04 12:15:30 +01:00 |
Alessia Bardi
|
f6ab238f5d
|
Updated descriptions
|
2020-11-04 11:50:47 +01:00 |
Miriam Baglioni
|
c010a8442f
|
fixed issue on test code
|
2020-11-03 17:26:51 +01:00 |