Commit Graph

536 Commits

Author SHA1 Message Date
Claudio Atzori eed9fe0902 code formatting 2023-10-06 12:31:17 +02:00
Claudio Atzori dc86018a5f Merge branch 'merge_entities_job' into beta 2023-10-02 11:24:48 +02:00
Alessia Bardi 0935d7757c Use v5 of the UNIBI Gold ISSN list in test 2023-09-20 15:41:35 +02:00
Alessia Bardi cc7204a089 tests for d4science catalog 2023-09-20 15:38:32 +02:00
Giambattista Bloisi 6cc7d8ca7b GroupEntities and DispatchEntites are now merged in GroupEntitiesSparkJob 2023-08-30 10:43:31 +02:00
Claudio Atzori bf35280ea6 code formatting 2023-08-29 11:11:00 +02:00
Giambattista Bloisi fab9920271 DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag 2023-08-09 15:41:43 +02:00
Miriam Baglioni c25ac21e5e Merge pull request 'graph cleaning, suggestions from ticket 8898' (#325) from cleaning_8898 into beta
Reviewed-on: D-Net/dnet-hadoop#325
2023-08-08 11:14:19 +02:00
Claudio Atzori 11ffb9bd68 rule out records with NULL dataInfo 2023-07-31 12:35:33 +02:00
Claudio Atzori 270df939c4 partial implementation of the suggestions from https://support.openaire.eu/issues/8898 2023-07-25 17:29:50 +02:00
Claudio Atzori e45777e7e1 [aggregator graph] added validation for URLs mapped from oaf:fulltext 2023-05-26 11:33:42 +02:00
Claudio Atzori 8a463cc3e8 fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common 2023-05-15 15:44:46 +02:00
Claudio Atzori d8882c4481 extended mapping applied to datacite records to produce affiliations using the ROR ids. Inc ase of APCs it includes the amount and the currently in the relation 2023-05-02 11:56:51 +02:00
Claudio Atzori dead87917f [graph cleaning] cleanup 2023-04-04 13:13:43 +02:00
Claudio Atzori 2a6ba29b64 [graph cleaning] unit tests & cleanup 2023-04-04 12:34:51 +02:00
Claudio Atzori c07857fa37 [graph cleaning] unit tests & cleanup 2023-03-23 15:57:47 +01:00
Claudio Atzori 90e61a8aba [graph cleaning] WIP: refactoring of the cleaning stages, unit tests 2023-03-23 15:03:26 +01:00
Claudio Atzori 488d9a5eaa [graph cleaning] WIP: refactoring of the cleaning stages, unit tests 2023-03-23 10:41:13 +01:00
Claudio Atzori 4f5ba0ed52 [graph cleaning] WIP: refactoring of the cleaning stages, unit tests 2023-03-21 14:41:20 +01:00
Claudio Atzori 9a03f71db1 code formatting 2023-02-13 16:25:47 +01:00
Miriam Baglioni d6895f0387 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-01-09 17:28:38 +01:00
Sandro La Bruzzo 3c9826f186 updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline 2022-12-21 11:21:17 +01:00
Miriam Baglioni 8685eaa706 [Clean Country] added test to verify remove of country 2022-12-16 15:31:25 +01:00
Miriam Baglioni dc0ec88a58 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-12-16 13:18:32 +01:00
Miriam Baglioni d791840b82 [Clean Country] added test to verify remove of country: 2022-12-16 13:18:29 +01:00
Claudio Atzori b8bafab8a0 [cleaning] improved vocabulary based mapping, specialization for the strict vocab cleaning 2022-12-12 14:43:03 +01:00
Claudio Atzori 730228d73d [cleaning] align wf parameter names in test 2022-12-08 18:40:22 +01:00
Claudio Atzori 8e3edba318 [graph cleaning] testing the collectedfron and hostedby patch procedure 2022-11-29 16:07:09 +01:00
Claudio Atzori 58c05731f9 [graph cleaning] WIP: testing the collectedfron and hostedby patch procedure 2022-11-29 11:21:51 +01:00
Claudio Atzori 24ef301cc1 [graph cleaning] patch the result's collectedfrom and hostedby identifiers according to the datasource master-duplicate mapping 2022-11-28 09:54:18 +01:00
Alessia Bardi 3c08269a4d Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-11-22 17:31:00 +01:00
Alessia Bardi 2687fc9f73 tests for EOSC Future review - ROhub 2022-11-22 17:30:56 +01:00
Alessia Bardi 208ed32315 fixed xpath for semantic relation 2022-10-23 18:18:13 +02:00
Alessia Bardi 31a10f000b Map the field oaf:eoscifguidelines from mdstores. Currently we can find it in ROHub metadata 2022-10-23 18:05:37 +02:00
Claudio Atzori b47aaf4dd1 [cleaning] subjects declared as belonging to specific vocabularies whose values are not found in the vocab are set to type keyword 2022-10-13 11:23:43 +02:00
Alessia Bardi 49360770d7 map w3id as instance url 2022-09-28 14:16:39 +02:00
Claudio Atzori 0b3e44e521 Merge branch 'beta' into relation-from-odf 2022-09-27 14:57:01 +02:00
Claudio Atzori 25e9d92aad Merge branch 'beta' into clean_country 2022-09-27 14:27:49 +02:00
Alessia Bardi fd63e9bfac Mapping all relationships supported in ModelConstants and ModelSupport 2022-09-26 11:24:13 +02:00
Alessia Bardi c5eb722170 relationships from relatedIdentifier whose target id type is one of the pid type with an authority 2022-09-23 15:47:05 +02:00
Claudio Atzori c86cc53520 suppressing hyper verbose spark logs during unit test execution 2022-09-23 15:20:40 +02:00
Alessia Bardi ba33ff71fd refactoring for the generation of relationships from related identifier of type 'OPENAIRE' 2022-09-23 15:17:13 +02:00
Alessia Bardi 982bcc1e35 test wrid pid and record identifier 2022-09-23 12:06:06 +02:00
Claudio Atzori e45ec15221 Merge branch 'beta' into clean_country 2022-09-19 11:34:02 +02:00
Claudio Atzori 26e1badded added instance.url syntactical validation, avoid creating multiple duplicated URLs 2022-09-19 11:19:10 +02:00
Claudio Atzori 192215a18e merged from branch discard-non-wellformed 2022-09-19 10:17:10 +02:00
Claudio Atzori 1e42d984e1 [aggregator graph] save invalid records aside for further inspection 2022-09-15 10:49:42 +02:00
Alessia Bardi 9e7ec4198f fixed test 2022-09-14 18:08:56 +02:00
Claudio Atzori a0919ed495 [aggregator graph] save invalid records aside for further inspection 2022-09-14 13:27:39 +02:00
Alessia Bardi b99a011345 return empty Oaf list if record cannot be parsed 2022-09-13 11:51:55 +02:00