1
0
Fork 0
Commit Graph

147 Commits

Author SHA1 Message Date
Claudio Atzori 96238152cb added serialization for alternateIdentifiers and pids within each record instance 2021-05-28 16:57:30 +02:00
Claudio Atzori 23b8883ab1 applied intellij code cleanup 2021-05-14 10:58:12 +02:00
Claudio Atzori 609eb711b3 IndexRecordTransformerTest for producing a record that can be manually submitted to solr 2021-05-13 16:13:28 +02:00
Claudio Atzori 1517bf7c92 IndexRecordTransformerTest for producing a record that can be manually submitted to solr 2021-05-13 16:11:22 +02:00
Claudio Atzori 5afa7d3e0c core utilities in dhp-common moved in external module dhp-schemas 2021-04-27 15:44:01 +02:00
Claudio Atzori 1e7e5180fa [Graph model] updated definition of ExternalReference: added alternateLabel, removed description (#6503) 2021-04-02 12:32:12 +02:00
Claudio Atzori 7941d7be29 WIP: using common definitions from ModelConstants 2021-03-31 18:33:57 +02:00
Claudio Atzori 72ce741ea6 WIP: using common definitions from ModelConstants 2021-03-31 17:07:13 +02:00
Sandro La Bruzzo c73072079d fix conflicts 2021-03-22 16:36:31 +01:00
Claudio Atzori 8d2bb24512 merged from master 2021-03-08 15:44:34 +01:00
Alessia Bardi 32e81c2d89 non validated rel has null value in validated field 2021-02-16 11:01:42 +01:00
Claudio Atzori 29c6f7e255 classes related to the collection workflow moved into common package; implemented MongoDB collection plugins 2021-02-12 12:31:02 +01:00
Claudio Atzori b34b5a39ca index field authoridtypevalue mixes up different author id-type value pairs, dropped in favour of orcidtypevalue 2021-02-11 09:36:04 +01:00
Alessia Bardi 986dd969d3 use the proper import for Lists 2021-02-10 12:03:54 +01:00
Alessia Bardi 09fc7e2f78 serialization of validated flag on relationships 2021-02-10 11:22:09 +01:00
Claudio Atzori 82e6c50f3f updated solr fields (authoridtypevalue, resultsubject, resultresourcetypename) 2021-02-09 16:27:04 +01:00
Claudio Atzori 62bd3c53ee Merge branch 'master' into provision_indexing 2021-02-09 15:46:26 +01:00
Claudio Atzori 1506f49052 Xml record serialization for author PIDs: 1) only one value per PID type is allowed; 2) orcid prevails over orcid_pending 2020-12-14 11:14:03 +01:00
Claudio Atzori 61cd129ded XML serialisation test 2020-12-11 12:44:53 +01:00
Claudio Atzori ce7a319e01 using the correct assertion import 2020-12-11 12:44:17 +01:00
Claudio Atzori d9532446eb imported more diffs from master branch; code formatting 2020-12-10 16:14:16 +01:00
Claudio Atzori 12e2f930c8 resolved conflicts 2020-12-10 10:57:39 +01:00
Claudio Atzori ff72fcd91a allow orcid_pending to be percolate to the XML graph serialization 2020-12-09 19:04:50 +01:00
Claudio Atzori 211aa04726 allow orcid_pending to be percolate to the XML graph serialization 2020-12-09 18:08:51 +01:00
Claudio Atzori 026ad40633 disabled test 2020-12-07 13:50:01 +01:00
Claudio Atzori cfb55effd9 code formatting 2020-12-02 11:23:49 +01:00
Alessia Bardi 2d15667b4a testing XML generation from json object (case AMS ACTA) 2020-12-02 10:16:26 +01:00
Claudio Atzori d48f388fb2 Merge branch 'provision_indexing' 2020-11-19 15:59:55 +01:00
Claudio Atzori 7c9feaf9e7 project attributes removed from the XML record serialization: contactfullname, contactfax, contactphone, contactemail 2020-11-19 15:26:20 +01:00
Claudio Atzori 3f34757c63 merged from master 2020-11-19 14:34:54 +01:00
Claudio Atzori 0374d34c3e introduced configuration param outputFormat: HDFS | SOLR 2020-11-19 10:34:28 +01:00
Claudio Atzori 5218718e8b updated set of fields from the MDFormatDSResourceType on PROD 2020-11-18 15:00:41 +01:00
Claudio Atzori d9e07a242b extended XmlIndexingJob to accept an optional parameter: outputPath. When present, forces the job to write its output on the specified HDFS location 2020-11-18 14:34:55 +01:00
Claudio Atzori 29dcff0f34 spark complains about missing classes, so here they are again 2020-11-18 14:32:32 +01:00
Claudio Atzori 8177ce7939 test for XmlIndexingJob based on a local miniSolrCluster 2020-11-18 10:58:05 +01:00
Claudio Atzori 2bed29eb09 WIP: added oozie workflow for grouping graph entities by id 2020-11-13 10:05:12 +01:00
Claudio Atzori 9b0fb9e958 merged from master 2020-11-12 09:27:12 +01:00
Claudio Atzori 822971f54f no need to filter relations in CreateRelatedEntitiesJob_phase1; replaced 'left outer' join with 'left' join in CreateRelatedEntitiesJob_phase2; cleanup; 2020-11-12 09:22:59 +01:00
Claudio Atzori 18d9aad70c improved documentation in dhp-graph-provision 2020-11-10 11:48:55 +01:00
Claudio Atzori 58f28296ea ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values 2020-10-30 10:56:42 +01:00
Claudio Atzori 1871d1c6f6 solve error java.lang.NoSuchFieldError: INSTANCE when instantiating Solr client 2020-08-14 11:18:30 +02:00
Claudio Atzori 3a11a387a9 data provision workflow enhancement: added nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed 2020-08-03 14:28:08 +02:00
Claudio Atzori cc5d13da85 introduced parameter shouldIndex (true|false) 2020-07-16 13:46:39 +02:00
Claudio Atzori b098cc3cbe avoid repeating identical values for fields: source, description 2020-07-16 13:45:53 +02:00
Claudio Atzori 7d6e269b40 reverted CreateRelatedEntitiesJob_phase1 to its previous state 2020-07-13 22:54:04 +02:00
Claudio Atzori 8e97598eb4 avoid to NPE in case of null instances 2020-07-13 20:46:14 +02:00
Claudio Atzori 06c1913062 added different limits for grouping by source and by target, incremented spark.sql.shuffle.partitions for the join operations 2020-07-10 19:03:33 +02:00
Claudio Atzori 4c3836f62e materialize the related entities before joining them 2020-07-10 19:00:44 +02:00
Claudio Atzori b21866a2da allow to set different to relations cut points by source and by target; adjusted weight assigned to relationship types 2020-07-10 13:59:48 +02:00
Claudio Atzori ff4d6214f1 experimenting with pruning of relations 2020-07-10 10:06:41 +02:00