Commit Graph

172 Commits

Author SHA1 Message Date
Claudio Atzori 0c05abe50b [graph indexing] sets spark memoryOverhead in the join operations to the same value used for the memory executor 2024-04-19 16:57:55 +02:00
Claudio Atzori 9b13c22e5d [graph provision] retrieve all the context information by adding all=true to the requests issued to thr API 2024-01-23 15:36:08 +01:00
Claudio Atzori f87f3a6483 [graph provision] updated param specification for the XML converter job 2024-01-23 08:54:37 +01:00
Claudio Atzori 1c6db320f4 [graph provision] obtain context info from the context API instead from the ISLookUp service 2024-01-22 15:53:17 +01:00
Claudio Atzori 321922772b added serialization for the new fields imported for the Irish tender 2023-12-05 16:37:04 +01:00
Claudio Atzori a72b9e96ac expand the instance level fulltext in the XML records 2023-07-27 14:57:38 +02:00
Giambattista Bloisi e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Claudio Atzori 63b8bbc015 [graph to Solr] using dedicated sparkExecutorCores, sparkExecutorMemory, sparkDriverMemory in convert_to_xml 2023-03-24 13:43:20 +01:00
Claudio Atzori 308e10d102 serialising: 1. measures for all the entity types and 2. result level fulltext 2023-03-23 11:23:22 +01:00
Claudio Atzori 41e00bcd07 [graph provision] avoid to parse again the XML records, apparently the escaped XML characters get unescaped invalidating the record 2023-03-13 15:19:49 +01:00
Serafeim Chatzopoulos 0b5bf53b45 Remove unecessary indexed fields from Solr 2023-02-23 12:42:42 +02:00
Claudio Atzori 499826ead1 serialising field eoscifguidelines field in the Solr XML records 2022-08-04 12:40:48 +02:00
Claudio Atzori 072f192853 include the class information in the measure XML serialization 2022-07-01 09:54:56 +02:00
Sandro La Bruzzo 4c50f35c8b update publication Date format 2022-05-16 10:29:36 +02:00
Claudio Atzori 9e12cb3c92 EOSC Services - removed field knowledgegraph; depending on the released schema module 2022-05-03 11:55:45 +02:00
Claudio Atzori 05c1ea92e9 EOSC Services - added Service-specific fields in the XML record serialization 2022-04-29 15:56:55 +02:00
Claudio Atzori f5f532d134 EOSC Services - ongoing update 2022-04-29 12:25:24 +02:00
Claudio Atzori 48d32466e4 instances grouped by URL expose only one refereed 2022-03-23 14:52:03 +01:00
Claudio Atzori a87c070447 conflicts resolved, merged from beta 2022-02-24 12:51:31 +01:00
Claudio Atzori 86cdb7a38f [provision] serialize measures defined on the result level 2022-02-23 15:54:18 +01:00
Alessia Bardi 600ede1798 serialisation of APCs int he XML records 2022-02-11 11:00:20 +01:00
Claudio Atzori cccb16900c https://support.openaire.eu/issues/7330 normalising DOI urls 2021-12-23 12:33:53 +01:00
Claudio Atzori 98eb292c59 avoid NPEs merging XMLInstance(s) 2021-12-13 13:27:20 +01:00
Claudio Atzori 5e17247bb6 avoid NPEs merging XMLInstance(s) 2021-12-13 11:48:40 +01:00
Claudio Atzori b70ecccea0 avoid NPEs merging XMLInstance(s) 2021-12-12 12:37:38 +01:00
Alessia Bardi e53228401b style 2021-12-09 15:46:22 +01:00
Alessia Bardi 6b5d7688a4 #7275 serialize license information in XML records 2021-12-09 13:46:48 +01:00
Claudio Atzori 9cac283bec implemented Instance serialization features requested in https://support.openaire.eu/issues/7156 2021-12-02 17:20:33 +01:00
Claudio Atzori e471f12d5e hotfix: recovered implementation removing the hardcoded working_dirs 2021-10-19 12:35:38 +02:00
Sandro La Bruzzo d4dadf6d77 reduced max number of PID in Relatedentity 2021-09-02 14:21:24 +02:00
Sandro La Bruzzo 9f8a80deb7 fixed wrong import of unresolved relation in openaire 2021-09-01 14:16:27 +02:00
Alessia Bardi 3762b17f7b added VERSIOn and PART relationship and re-ordered according to my personal and obviously possibly biased
ordering
2021-08-31 20:20:05 +02:00
Alessia Bardi 931f430129 Merge branch 'beta' into datasource_model_eosc_beta 2021-08-23 11:57:21 +02:00
Claudio Atzori 9f4db73f30 updated/fixed unit tests 2021-08-11 15:02:51 +02:00
Claudio Atzori 2ee21da43b suggestions from SonarLint 2021-08-11 12:13:22 +02:00
Sandro La Bruzzo 6358f92c3a added sleep to solve problem of lost request of creating index 2021-07-30 08:54:37 +02:00
Claudio Atzori c53d106e80 [provision] lowercase relation filter 2021-07-29 13:57:00 +02:00
Sandro La Bruzzo 3721df7aa6 refactoring create actionset of scholexplorer, moved on package dhp-aggregation 2021-07-29 10:45:35 +02:00
Sandro La Bruzzo 3d8f0f629b implemented workflow of creation action set for scholexplorer 2021-07-28 16:15:34 +02:00
Michele Artini 3e2a2d6e71 added new fields in xml 2021-07-28 11:56:55 +02:00
Claudio Atzori 2fff24df55 code formatting 2021-07-28 11:34:19 +02:00
Sandro La Bruzzo 16c91203bd implemented workflow of creation action set for scholexplorer 2021-07-28 10:30:49 +02:00
Michele Artini 52e2315ba2 removed trick for datasourcetypeui 2021-07-28 10:23:00 +02:00
Claudio Atzori 10d7b4f0b4 filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure 2021-07-20 11:52:05 +02:00
Sandro La Bruzzo bbe8193930 merged stable ids 2021-07-12 17:00:43 +02:00
Sandro La Bruzzo 57c74c73c6 fixed mistakes in oozie workflow 2021-07-09 12:28:09 +02:00
Sandro La Bruzzo 61ccb54fde removed wrong loop on oozie wf 2021-07-09 12:17:57 +02:00
Sandro La Bruzzo 9f5a0f3ab6 moved wf indexing of Scholexplorer in dhp-graph-provision 2021-07-09 12:06:43 +02:00
Claudio Atzori 96238152cb added serialization for alternateIdentifiers and pids within each record instance 2021-05-28 16:57:30 +02:00
Claudio Atzori 23b8883ab1 applied intellij code cleanup 2021-05-14 10:58:12 +02:00