Claudio Atzori
|
29c6f7e255
|
classes related to the collection workflow moved into common package; implemented MongoDB collection plugins
|
2021-02-12 12:31:02 +01:00 |
Claudio Atzori
|
72c57b28fa
|
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
|
2021-02-04 14:08:18 +01:00 |
Claudio Atzori
|
b6f08ce226
|
re-adding the old junit:junit dep as solr-test-framework needs it
|
2020-12-14 15:07:31 +01:00 |
Claudio Atzori
|
1506f49052
|
Xml record serialization for author PIDs: 1) only one value per PID type is allowed; 2) orcid prevails over orcid_pending
|
2020-12-14 11:14:03 +01:00 |
Claudio Atzori
|
61cd129ded
|
XML serialisation test
|
2020-12-11 12:44:53 +01:00 |
Claudio Atzori
|
ce7a319e01
|
using the correct assertion import
|
2020-12-11 12:44:17 +01:00 |
Claudio Atzori
|
7fe2433137
|
excluded transitive older junit dependencies, they can compromise the unit test executions
|
2020-12-11 12:42:55 +01:00 |
Claudio Atzori
|
026ad40633
|
disabled test
|
2020-12-07 13:50:01 +01:00 |
Claudio Atzori
|
cfb55effd9
|
code formatting
|
2020-12-02 11:23:49 +01:00 |
Alessia Bardi
|
2d15667b4a
|
testing XML generation from json object (case AMS ACTA)
|
2020-12-02 10:16:26 +01:00 |
Claudio Atzori
|
d48f388fb2
|
Merge branch 'provision_indexing'
|
2020-11-19 15:59:55 +01:00 |
Claudio Atzori
|
7c9feaf9e7
|
project attributes removed from the XML record serialization: contactfullname, contactfax, contactphone, contactemail
|
2020-11-19 15:26:20 +01:00 |
Claudio Atzori
|
0374d34c3e
|
introduced configuration param outputFormat: HDFS | SOLR
|
2020-11-19 10:34:28 +01:00 |
Claudio Atzori
|
5218718e8b
|
updated set of fields from the MDFormatDSResourceType on PROD
|
2020-11-18 15:00:41 +01:00 |
Claudio Atzori
|
d9e07a242b
|
extended XmlIndexingJob to accept an optional parameter: outputPath. When present, forces the job to write its output on the specified HDFS location
|
2020-11-18 14:34:55 +01:00 |
Claudio Atzori
|
29dcff0f34
|
spark complains about missing classes, so here they are again
|
2020-11-18 14:32:32 +01:00 |
Claudio Atzori
|
8177ce7939
|
test for XmlIndexingJob based on a local miniSolrCluster
|
2020-11-18 10:58:05 +01:00 |
Claudio Atzori
|
2bed29eb09
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:12 +01:00 |
Claudio Atzori
|
822971f54f
|
no need to filter relations in CreateRelatedEntitiesJob_phase1; replaced 'left outer' join with 'left' join in CreateRelatedEntitiesJob_phase2; cleanup;
|
2020-11-12 09:22:59 +01:00 |
Claudio Atzori
|
18d9aad70c
|
improved documentation in dhp-graph-provision
|
2020-11-10 11:48:55 +01:00 |
Claudio Atzori
|
1871d1c6f6
|
solve error java.lang.NoSuchFieldError: INSTANCE when instantiating Solr client
|
2020-08-14 11:18:30 +02:00 |
Claudio Atzori
|
3a11a387a9
|
data provision workflow enhancement: added nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed
|
2020-08-03 14:28:08 +02:00 |
Claudio Atzori
|
cc5d13da85
|
introduced parameter shouldIndex (true|false)
|
2020-07-16 13:46:39 +02:00 |
Claudio Atzori
|
b098cc3cbe
|
avoid repeating identical values for fields: source, description
|
2020-07-16 13:45:53 +02:00 |
Claudio Atzori
|
7d6e269b40
|
reverted CreateRelatedEntitiesJob_phase1 to its previous state
|
2020-07-13 22:54:04 +02:00 |
Claudio Atzori
|
8e97598eb4
|
avoid to NPE in case of null instances
|
2020-07-13 20:46:14 +02:00 |
Claudio Atzori
|
06c1913062
|
added different limits for grouping by source and by target, incremented spark.sql.shuffle.partitions for the join operations
|
2020-07-10 19:03:33 +02:00 |
Claudio Atzori
|
4c3836f62e
|
materialize the related entities before joining them
|
2020-07-10 19:00:44 +02:00 |
Claudio Atzori
|
b21866a2da
|
allow to set different to relations cut points by source and by target; adjusted weight assigned to relationship types
|
2020-07-10 13:59:48 +02:00 |
Claudio Atzori
|
ff4d6214f1
|
experimenting with pruning of relations
|
2020-07-10 10:06:41 +02:00 |
Claudio Atzori
|
b383ed42fa
|
pass optional parameter relationFilter to the PrepareRelationJob implementation
|
2020-07-07 14:21:28 +02:00 |
Claudio Atzori
|
d380b85246
|
unit test for the preparation of the relations
|
2020-07-02 12:42:13 +02:00 |
Claudio Atzori
|
7817338e05
|
added test to verify the relation pre-processing
|
2020-06-26 17:58:33 +02:00 |
Claudio Atzori
|
8d59fdf34e
|
WIP: dataset based PrepareRelationsJob
|
2020-06-26 14:32:58 +02:00 |
Claudio Atzori
|
216975c4ec
|
restored complete provision workflow
|
2020-06-25 12:55:52 +02:00 |
Claudio Atzori
|
93f627ea51
|
code formatting
|
2020-06-25 12:54:21 +02:00 |
Claudio Atzori
|
e62333192c
|
WIP: prepare relation job
|
2020-06-25 12:22:18 +02:00 |
Claudio Atzori
|
6933ec11fb
|
WIP: prepare relation job
|
2020-06-25 11:04:12 +02:00 |
Sandro La Bruzzo
|
a6c0faac70
|
added test to verify secondary sorting
|
2020-06-25 10:48:15 +02:00 |
Claudio Atzori
|
69b0391708
|
WIP: prepare relation job
|
2020-06-25 10:19:56 +02:00 |
Claudio Atzori
|
46e76affeb
|
WIP: prepare relation job
|
2020-06-24 19:01:15 +02:00 |
Claudio Atzori
|
0e723d378b
|
added default from vocab for missing instance.refereed; remove spurious prefixes from orcid values; WIP: prepare relation job
|
2020-06-24 18:34:42 +02:00 |
Claudio Atzori
|
9cd27183b6
|
[maven-release-plugin] prepare for next development iteration
|
2020-06-22 11:27:44 +02:00 |
Claudio Atzori
|
1e3dab0631
|
[maven-release-plugin] prepare release dhp-1.2.3
|
2020-06-22 11:27:39 +02:00 |
Claudio Atzori
|
c4d9f1837f
|
[maven-release-plugin] prepare for next development iteration
|
2020-06-12 12:21:08 +02:00 |
Claudio Atzori
|
f0746a7605
|
[maven-release-plugin] prepare release dhp-1.2.2
|
2020-06-12 12:21:03 +02:00 |
Claudio Atzori
|
463489f59f
|
code formatting
|
2020-06-12 12:03:25 +02:00 |
Claudio Atzori
|
4bcad1c9c3
|
Merge branch 'graph_cleaning'
|
2020-06-12 11:40:25 +02:00 |
Alessia Bardi
|
e79943965b
|
Fixes #5604: field oamandatepublications in XML
|
2020-06-11 12:49:31 +02:00 |
Claudio Atzori
|
67c7b31ba6
|
Merge branch 'master' into graph_cleaning
|
2020-06-10 15:00:35 +02:00 |