Commit Graph

262 Commits

Author SHA1 Message Date
Miriam Baglioni 2b643059fa [Country Propagation] changed the logic to get the collectedfrom at the result level. To fix issue when no instance is created for a result that should have the country associated. Change the code to use spark instead of hive to prepare the data needed for the propagation step. Added new tests for the intermediate steps and new verification for the propagation itself 2022-03-11 13:56:48 +01:00
Claudio Atzori a87c070447 conflicts resolved, merged from beta 2022-02-24 12:51:31 +01:00
Claudio Atzori 86cdb7a38f [provision] serialize measures defined on the result level 2022-02-23 15:54:18 +01:00
Alessia Bardi 9d6203f79b test mapping datasource 2022-02-23 15:00:53 +01:00
Alessia Bardi 600ede1798 serialisation of APCs int he XML records 2022-02-11 11:00:20 +01:00
Claudio Atzori cccb16900c https://support.openaire.eu/issues/7330 normalising DOI urls 2021-12-23 12:33:53 +01:00
Claudio Atzori 98eb292c59 avoid NPEs merging XMLInstance(s) 2021-12-13 13:27:20 +01:00
Claudio Atzori 5e17247bb6 avoid NPEs merging XMLInstance(s) 2021-12-13 11:48:40 +01:00
Claudio Atzori b70ecccea0 avoid NPEs merging XMLInstance(s) 2021-12-12 12:37:38 +01:00
Alessia Bardi e53228401b style 2021-12-09 15:46:22 +01:00
Alessia Bardi 6b5d7688a4 #7275 serialize license information in XML records 2021-12-09 13:46:48 +01:00
Claudio Atzori 9cac283bec implemented Instance serialization features requested in https://support.openaire.eu/issues/7156 2021-12-02 17:20:33 +01:00
Claudio Atzori 1de881b796 resolved conflicts for #165 2021-11-26 16:15:11 +01:00
Sandro La Bruzzo c9870c5122 code formatted 2021-10-19 15:24:59 +02:00
Claudio Atzori e471f12d5e hotfix: recovered implementation removing the hardcoded working_dirs 2021-10-19 12:35:38 +02:00
Claudio Atzori 14fbf92ad6 Merge branch 'beta' into beta_solr_config 2021-10-14 11:08:44 +02:00
Sandro La Bruzzo 5606014b17 code refactor see ticket #7065 2021-10-12 08:11:53 +02:00
Claudio Atzori 2f61054cd1 code formatting 2021-10-11 18:29:42 +02:00
Serafeim Chatzopoulos 201ce71cc1 Add resultsubject, relprojectname and resultacceptanceyear to __all field 2021-10-11 13:16:39 +03:00
Serafeim Chatzopoulos e468a7b96b Add tests to query Solr with different configurations 2021-10-08 16:58:51 +03:00
Serafeim Chatzopoulos de81007302 Add exploreTestConfig, a new Solr configuration folder 2021-10-08 16:54:56 +03:00
Alessia Bardi 8d3b60f446 test for patching records for EOSC Future 2021-10-07 17:30:45 +02:00
Alessia Bardi b924276e18 tests to generate records for the EOSC-Future demo with the EOSC Jupyter Notebbok subject 2021-09-24 17:11:56 +02:00
Sandro La Bruzzo d4dadf6d77 reduced max number of PID in Relatedentity 2021-09-02 14:21:24 +02:00
Sandro La Bruzzo 9f8a80deb7 fixed wrong import of unresolved relation in openaire 2021-09-01 14:16:27 +02:00
Alessia Bardi 3762b17f7b added VERSIOn and PART relationship and re-ordered according to my personal and obviously possibly biased
ordering
2021-08-31 20:20:05 +02:00
Alessia Bardi 931f430129 Merge branch 'beta' into datasource_model_eosc_beta 2021-08-23 11:57:21 +02:00
Claudio Atzori 9f4db73f30 updated/fixed unit tests 2021-08-11 15:02:51 +02:00
Claudio Atzori 2ee21da43b suggestions from SonarLint 2021-08-11 12:13:22 +02:00
Sandro La Bruzzo 6358f92c3a added sleep to solve problem of lost request of creating index 2021-07-30 08:54:37 +02:00
Claudio Atzori c53d106e80 [provision] lowercase relation filter 2021-07-29 13:57:00 +02:00
Sandro La Bruzzo 3721df7aa6 refactoring create actionset of scholexplorer, moved on package dhp-aggregation 2021-07-29 10:45:35 +02:00
Sandro La Bruzzo 3d8f0f629b implemented workflow of creation action set for scholexplorer 2021-07-28 16:15:34 +02:00
Alessia Bardi df8715a1ec format code after mvn compile 2021-07-28 11:58:26 +02:00
Michele Artini 3e2a2d6e71 added new fields in xml 2021-07-28 11:56:55 +02:00
Alessia Bardi c806387d4b tests for enermaps 2021-07-28 11:54:36 +02:00
Claudio Atzori 2fff24df55 code formatting 2021-07-28 11:34:19 +02:00
Sandro La Bruzzo 16c91203bd implemented workflow of creation action set for scholexplorer 2021-07-28 10:30:49 +02:00
Michele Artini 52e2315ba2 removed trick for datasourcetypeui 2021-07-28 10:23:00 +02:00
Claudio Atzori 10d7b4f0b4 filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure 2021-07-20 11:52:05 +02:00
Sandro La Bruzzo bbe8193930 merged stable ids 2021-07-12 17:00:43 +02:00
Sandro La Bruzzo 57c74c73c6 fixed mistakes in oozie workflow 2021-07-09 12:28:09 +02:00
Sandro La Bruzzo 61ccb54fde removed wrong loop on oozie wf 2021-07-09 12:17:57 +02:00
Sandro La Bruzzo 9f5a0f3ab6 moved wf indexing of Scholexplorer in dhp-graph-provision 2021-07-09 12:06:43 +02:00
Claudio Atzori 96238152cb added serialization for alternateIdentifiers and pids within each record instance 2021-05-28 16:57:30 +02:00
Claudio Atzori 23b8883ab1 applied intellij code cleanup 2021-05-14 10:58:12 +02:00
Claudio Atzori 609eb711b3 IndexRecordTransformerTest for producing a record that can be manually submitted to solr 2021-05-13 16:13:28 +02:00
Claudio Atzori 1517bf7c92 IndexRecordTransformerTest for producing a record that can be manually submitted to solr 2021-05-13 16:11:22 +02:00
Claudio Atzori 5afa7d3e0c core utilities in dhp-common moved in external module dhp-schemas 2021-04-27 15:44:01 +02:00
Claudio Atzori 27ab8a704d adjusted poms to align with the external dhp-schema module 2021-04-27 10:12:27 +02:00
Claudio Atzori c2bb03c8b5 depending on external dhp-schemas module 2021-04-23 17:57:35 +02:00
Claudio Atzori 1e7e5180fa [Graph model] updated definition of ExternalReference: added alternateLabel, removed description (#6503) 2021-04-02 12:32:12 +02:00
Claudio Atzori 7941d7be29 WIP: using common definitions from ModelConstants 2021-03-31 18:33:57 +02:00
Claudio Atzori 72ce741ea6 WIP: using common definitions from ModelConstants 2021-03-31 17:07:13 +02:00
Sandro La Bruzzo c73072079d fix conflicts 2021-03-22 16:36:31 +01:00
Claudio Atzori 8d2bb24512 merged from master 2021-03-08 15:44:34 +01:00
Alessia Bardi 32e81c2d89 non validated rel has null value in validated field 2021-02-16 11:01:42 +01:00
Claudio Atzori 29c6f7e255 classes related to the collection workflow moved into common package; implemented MongoDB collection plugins 2021-02-12 12:31:02 +01:00
Claudio Atzori b34b5a39ca index field authoridtypevalue mixes up different author id-type value pairs, dropped in favour of orcidtypevalue 2021-02-11 09:36:04 +01:00
Alessia Bardi 986dd969d3 use the proper import for Lists 2021-02-10 12:03:54 +01:00
Alessia Bardi 09fc7e2f78 serialization of validated flag on relationships 2021-02-10 11:22:09 +01:00
Claudio Atzori 82e6c50f3f updated solr fields (authoridtypevalue, resultsubject, resultresourcetypename) 2021-02-09 16:27:04 +01:00
Claudio Atzori 62bd3c53ee Merge branch 'master' into provision_indexing 2021-02-09 15:46:26 +01:00
Claudio Atzori 72c57b28fa switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT 2021-02-04 14:08:18 +01:00
Claudio Atzori b6f08ce226 re-adding the old junit:junit dep as solr-test-framework needs it 2020-12-14 15:07:31 +01:00
Claudio Atzori 1506f49052 Xml record serialization for author PIDs: 1) only one value per PID type is allowed; 2) orcid prevails over orcid_pending 2020-12-14 11:14:03 +01:00
Claudio Atzori 61cd129ded XML serialisation test 2020-12-11 12:44:53 +01:00
Claudio Atzori ce7a319e01 using the correct assertion import 2020-12-11 12:44:17 +01:00
Claudio Atzori 7fe2433137 excluded transitive older junit dependencies, they can compromise the unit test executions 2020-12-11 12:42:55 +01:00
Claudio Atzori d9532446eb imported more diffs from master branch; code formatting 2020-12-10 16:14:16 +01:00
Claudio Atzori 12e2f930c8 resolved conflicts 2020-12-10 10:57:39 +01:00
Claudio Atzori ff72fcd91a allow orcid_pending to be percolate to the XML graph serialization 2020-12-09 19:04:50 +01:00
Claudio Atzori 211aa04726 allow orcid_pending to be percolate to the XML graph serialization 2020-12-09 18:08:51 +01:00
Claudio Atzori 026ad40633 disabled test 2020-12-07 13:50:01 +01:00
Claudio Atzori cfb55effd9 code formatting 2020-12-02 11:23:49 +01:00
Alessia Bardi 2d15667b4a testing XML generation from json object (case AMS ACTA) 2020-12-02 10:16:26 +01:00
Claudio Atzori d48f388fb2 Merge branch 'provision_indexing' 2020-11-19 15:59:55 +01:00
Claudio Atzori 7c9feaf9e7 project attributes removed from the XML record serialization: contactfullname, contactfax, contactphone, contactemail 2020-11-19 15:26:20 +01:00
Claudio Atzori 3f34757c63 merged from master 2020-11-19 14:34:54 +01:00
Claudio Atzori 0374d34c3e introduced configuration param outputFormat: HDFS | SOLR 2020-11-19 10:34:28 +01:00
Claudio Atzori 5218718e8b updated set of fields from the MDFormatDSResourceType on PROD 2020-11-18 15:00:41 +01:00
Claudio Atzori d9e07a242b extended XmlIndexingJob to accept an optional parameter: outputPath. When present, forces the job to write its output on the specified HDFS location 2020-11-18 14:34:55 +01:00
Claudio Atzori 29dcff0f34 spark complains about missing classes, so here they are again 2020-11-18 14:32:32 +01:00
Claudio Atzori 8177ce7939 test for XmlIndexingJob based on a local miniSolrCluster 2020-11-18 10:58:05 +01:00
Claudio Atzori 2bed29eb09 WIP: added oozie workflow for grouping graph entities by id 2020-11-13 10:05:12 +01:00
Claudio Atzori 9b0fb9e958 merged from master 2020-11-12 09:27:12 +01:00
Claudio Atzori 822971f54f no need to filter relations in CreateRelatedEntitiesJob_phase1; replaced 'left outer' join with 'left' join in CreateRelatedEntitiesJob_phase2; cleanup; 2020-11-12 09:22:59 +01:00
Claudio Atzori 18d9aad70c improved documentation in dhp-graph-provision 2020-11-10 11:48:55 +01:00
Claudio Atzori 58f28296ea ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values 2020-10-30 10:56:42 +01:00
Claudio Atzori 1871d1c6f6 solve error java.lang.NoSuchFieldError: INSTANCE when instantiating Solr client 2020-08-14 11:18:30 +02:00
Claudio Atzori 3a11a387a9 data provision workflow enhancement: added nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed 2020-08-03 14:28:08 +02:00
Claudio Atzori cc5d13da85 introduced parameter shouldIndex (true|false) 2020-07-16 13:46:39 +02:00
Claudio Atzori b098cc3cbe avoid repeating identical values for fields: source, description 2020-07-16 13:45:53 +02:00
Claudio Atzori 7d6e269b40 reverted CreateRelatedEntitiesJob_phase1 to its previous state 2020-07-13 22:54:04 +02:00
Claudio Atzori 8e97598eb4 avoid to NPE in case of null instances 2020-07-13 20:46:14 +02:00
Claudio Atzori 06c1913062 added different limits for grouping by source and by target, incremented spark.sql.shuffle.partitions for the join operations 2020-07-10 19:03:33 +02:00
Claudio Atzori 4c3836f62e materialize the related entities before joining them 2020-07-10 19:00:44 +02:00
Claudio Atzori b21866a2da allow to set different to relations cut points by source and by target; adjusted weight assigned to relationship types 2020-07-10 13:59:48 +02:00
Claudio Atzori ff4d6214f1 experimenting with pruning of relations 2020-07-10 10:06:41 +02:00
Claudio Atzori b383ed42fa pass optional parameter relationFilter to the PrepareRelationJob implementation 2020-07-07 14:21:28 +02:00