Commit Graph

237 Commits

Author SHA1 Message Date
Claudio Atzori ef52128c55 included new stats* workflows in parent pom list of modules, code formatting 2024-03-26 10:42:10 +01:00
Claudio Atzori bfba71a95c further follow up changes from integrating the mergeutils branch 2024-03-26 09:01:18 +01:00
Claudio Atzori 078169b922 cleanup 2024-03-15 09:56:04 +01:00
Claudio Atzori af154d4456 implemented changes from #9497: sort abstracts by string length, included author fullnames in the related results, expanded instance details within each children/result XML element 2024-03-14 16:21:23 +01:00
Claudio Atzori 7863c92466 expanded paper abstract in the result/children XML element (ticket #9497) 2024-03-13 16:25:31 +01:00
Claudio Atzori eb5887cb9a including related organization url in the XML record serialization (ticket #9498) 2024-03-13 14:46:00 +01:00
Claudio Atzori db66555ebb WIP: updated provision workflow to create a JSON based representation of the payload 2024-03-12 09:56:09 +01:00
Claudio Atzori d4871b31e8 WIP: extended provision workflow to create the JSON based payload 2024-03-08 11:43:20 +01:00
Claudio Atzori 6fcf872daa Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into index_records 2024-02-28 10:27:28 +01:00
Claudio Atzori 3f07390a58 WIP 2024-02-28 10:10:10 +01:00
Sandro La Bruzzo 7d806a434c formatted code 2024-02-28 09:31:58 +01:00
Alessia Bardi f2a08d8cc2 test for Italian records from IRS repositories 2024-01-30 19:20:14 +01:00
Claudio Atzori 9b13c22e5d [graph provision] retrieve all the context information by adding all=true to the requests issued to thr API 2024-01-23 15:36:08 +01:00
Claudio Atzori f87f3a6483 [graph provision] updated param specification for the XML converter job 2024-01-23 08:54:37 +01:00
Claudio Atzori 1c6db320f4 [graph provision] obtain context info from the context API instead from the ISLookUp service 2024-01-22 15:53:17 +01:00
Miriam Baglioni 5011c4d11a refactoring after compiletion 2023-12-20 15:57:26 +01:00
Claudio Atzori ff924215b8 [graph provision] added tests for new peerreviewed field 2023-12-12 11:21:30 +01:00
Claudio Atzori 7e8eff40c1 [graph provision] added tests for the new model fields 2023-12-12 08:54:15 +01:00
Claudio Atzori 321922772b added serialization for the new fields imported for the Irish tender 2023-12-05 16:37:04 +01:00
Alessia Bardi cc7204a089 tests for d4science catalog 2023-09-20 15:38:32 +02:00
Claudio Atzori a72b9e96ac expand the instance level fulltext in the XML records 2023-07-27 14:57:38 +02:00
Giambattista Bloisi e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Miriam Baglioni b25b401065 added test to verify the advconstraints to dth community. inserted some additional logs. 2023-04-05 12:18:39 +02:00
Claudio Atzori 63b8bbc015 [graph to Solr] using dedicated sparkExecutorCores, sparkExecutorMemory, sparkDriverMemory in convert_to_xml 2023-03-24 13:43:20 +01:00
Claudio Atzori 308e10d102 serialising: 1. measures for all the entity types and 2. result level fulltext 2023-03-23 11:23:22 +01:00
Claudio Atzori 41e00bcd07 [graph provision] avoid to parse again the XML records, apparently the escaped XML characters get unescaped invalidating the record 2023-03-13 15:19:49 +01:00
Claudio Atzori 7aebedb43c code formatting 2023-02-27 11:51:27 +01:00
Serafeim Chatzopoulos 0b5bf53b45 Remove unecessary indexed fields from Solr 2023-02-23 12:42:42 +02:00
Claudio Atzori 1b8488976b code formatting 2022-12-07 10:45:38 +01:00
Claudio Atzori 8248da40d9 Merge branch 'beta' into graph_cleaning 2022-12-02 14:49:00 +01:00
Miriam Baglioni ce020f2c83 [EOSC FUTURE] added resources and test for review 2022-11-30 09:57:30 +01:00
Claudio Atzori 6082d235d3 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into graph_cleaning 2022-11-28 09:54:48 +01:00
Claudio Atzori 24ef301cc1 [graph cleaning] patch the result's collectedfrom and hostedby identifiers according to the datasource master-duplicate mapping 2022-11-28 09:54:18 +01:00
Alessia Bardi 90c8f9cb61 tests for EOSC Future 2022-11-23 12:18:44 +01:00
Alessia Bardi 2832117f23 added eoscifguidelines in test 2022-11-22 18:01:12 +01:00
Alessia Bardi 2687fc9f73 tests for EOSC Future review - ROhub 2022-11-22 17:30:56 +01:00
Claudio Atzori ff6f789b6d code formatting 2022-09-09 15:16:31 +02:00
Alessia Bardi 5c45d52af3 testing for RiuNet 2022-09-07 15:40:57 +03:00
Claudio Atzori 499826ead1 serialising field eoscifguidelines field in the Solr XML records 2022-08-04 12:40:48 +02:00
Claudio Atzori 072f192853 include the class information in the measure XML serialization 2022-07-01 09:54:56 +02:00
Sandro La Bruzzo 4c50f35c8b update publication Date format 2022-05-16 10:29:36 +02:00
Claudio Atzori 9e12cb3c92 EOSC Services - removed field knowledgegraph; depending on the released schema module 2022-05-03 11:55:45 +02:00
Claudio Atzori b6a7ff3a99 EOSC Services - removed fields from mapping, testing preparation 2022-05-02 15:52:33 +02:00
Claudio Atzori 05c1ea92e9 EOSC Services - added Service-specific fields in the XML record serialization 2022-04-29 15:56:55 +02:00
Claudio Atzori f5f532d134 EOSC Services - ongoing update 2022-04-29 12:25:24 +02:00
Claudio Atzori 48d32466e4 instances grouped by URL expose only one refereed 2022-03-23 14:52:03 +01:00
Miriam Baglioni 2b643059fa [Country Propagation] changed the logic to get the collectedfrom at the result level. To fix issue when no instance is created for a result that should have the country associated. Change the code to use spark instead of hive to prepare the data needed for the propagation step. Added new tests for the intermediate steps and new verification for the propagation itself 2022-03-11 13:56:48 +01:00
Claudio Atzori a87c070447 conflicts resolved, merged from beta 2022-02-24 12:51:31 +01:00
Claudio Atzori 86cdb7a38f [provision] serialize measures defined on the result level 2022-02-23 15:54:18 +01:00
Alessia Bardi 9d6203f79b test mapping datasource 2022-02-23 15:00:53 +01:00