Commit Graph

487 Commits

Author SHA1 Message Date
Claudio Atzori 0b3e44e521 Merge branch 'beta' into relation-from-odf 2022-09-27 14:57:01 +02:00
Alessia Bardi fd63e9bfac Mapping all relationships supported in ModelConstants and ModelSupport 2022-09-26 11:24:13 +02:00
Alessia Bardi c5eb722170 relationships from relatedIdentifier whose target id type is one of the pid type with an authority 2022-09-23 15:47:05 +02:00
Alessia Bardi ba33ff71fd refactoring for the generation of relationships from related identifier of type 'OPENAIRE' 2022-09-23 15:17:13 +02:00
Alessia Bardi 982bcc1e35 test wrid pid and record identifier 2022-09-23 12:06:06 +02:00
Claudio Atzori e45ec15221 Merge branch 'beta' into clean_country 2022-09-19 11:34:02 +02:00
Claudio Atzori 26e1badded added instance.url syntactical validation, avoid creating multiple duplicated URLs 2022-09-19 11:19:10 +02:00
Claudio Atzori 192215a18e merged from branch discard-non-wellformed 2022-09-19 10:17:10 +02:00
Claudio Atzori 1e42d984e1 [aggregator graph] save invalid records aside for further inspection 2022-09-15 10:49:42 +02:00
Alessia Bardi 9e7ec4198f fixed test 2022-09-14 18:08:56 +02:00
Claudio Atzori a0919ed495 [aggregator graph] save invalid records aside for further inspection 2022-09-14 13:27:39 +02:00
Alessia Bardi b99a011345 return empty Oaf list if record cannot be parsed 2022-09-13 11:51:55 +02:00
Alessia Bardi 27af5122d2 logs for non well formed XML files 2022-09-12 14:25:23 +02:00
Claudio Atzori ff6f789b6d code formatting 2022-09-09 15:16:31 +02:00
Claudio Atzori b5d6966c01 Merge branch 'beta' into clean_country 2022-09-09 12:20:19 +02:00
Claudio Atzori b5f7bd30be Merge branch 'beta' into clean_subjects 2022-09-09 12:20:04 +02:00
Claudio Atzori 1203378441 Merge branch 'beta' into clean_subjects 2022-09-09 10:38:47 +02:00
Claudio Atzori 14dc909a14 Merge branch 'beta' into clean_country 2022-09-09 10:38:17 +02:00
Alessia Bardi 9ef063d502 #7861#note-8 instance url from handle 2022-09-07 17:29:54 +03:00
Alessia Bardi 5c45d52af3 testing for RiuNet 2022-09-07 15:40:57 +03:00
Alessia Bardi a11eb38065 testing for RO-Hub 2022-09-02 16:07:36 +02:00
Claudio Atzori b7c387c21f cleaning of subjects: avoid duplicated subjects, prioritise collected vs inferred or other sources 2022-08-12 15:09:16 +02:00
Miriam Baglioni 62d2138806 [Clean Context] changed a bit the logic. Added the check not to have result hosted by a datasource of type institutional repository from NL. Added also the check that the country should have been included in the result via propagation for it to be removed 2022-08-08 14:10:47 +02:00
Claudio Atzori 3418ce50ac cleaning of subjects: perform the cleaning when the given value is equivalent to one of the terms in the vocabulary 2022-08-08 12:48:47 +02:00
Miriam Baglioni 390013a4b2 mergin with branch beta 2022-08-08 12:30:31 +02:00
Claudio Atzori 4eaa063b1f cleaning of subjects 2022-08-05 16:56:09 +02:00
Claudio Atzori 32cee1f619 WIP: cleaning of subjects 2022-08-05 12:32:08 +02:00
Miriam Baglioni a7a18d7630 [Graph Dump] removed code for the dump from the project. Fixed issues in tests when possible 2022-08-04 17:40:40 +02:00
Claudio Atzori e62018e95d [aggregator graph] added more assertions in test 2022-08-03 12:26:05 +02:00
Claudio Atzori f62c4e05cd code formatting 2022-07-29 11:56:01 +02:00
Claudio Atzori 1dd1e4fe3a extended test for mapping project_organization relations 2022-07-28 11:27:08 +02:00
Miriam Baglioni a12d28c644 [Clean Country] added logic not to remove country from result if it exist a hosting datasource with that country. Moreover the country will be removed only if added with propagation 2022-07-22 16:23:12 +02:00
Miriam Baglioni 65cc736e2f [Clean Country] first implementation to remove country NL from results collected from NARCIS when doi starts with mendely prefix 2022-07-20 17:05:56 +02:00
Claudio Atzori 1138b2ac8e code formatting 2022-07-19 14:15:49 +02:00
Claudio Atzori 0c1cfee396 mapping oaf:fulltext elements in the result.fulltext field 2022-07-11 17:34:59 +02:00
Claudio Atzori 0cb1c70788 code formatting 2022-07-01 10:44:08 +02:00
Claudio Atzori 4ec13e2b66 Merge branch 'master' into dump_new_funded_products 2022-07-01 10:30:28 +02:00
Miriam Baglioni 1d1fe3b151 [DUMP DELTA PROJECTS] refactoring 2022-06-27 18:04:59 +02:00
Miriam Baglioni edddfc6c63 [DUMP DELTA PROJECTS] adding test and resource 2022-06-21 18:28:53 +02:00
Miriam Baglioni b98f904d48 [Funder Products Dump] new way to avoid using hive 2022-06-21 17:52:27 +02:00
Claudio Atzori b295a40d9c restored use of name_particles when parsing author names 2022-06-16 12:20:43 +02:00
Claudio Atzori 116902c028 mapping relationship from trasformed records based on oaf:relation 2022-06-13 14:31:48 +02:00
Alessia Bardi 68bd58d6a4 tests for ROHub 2022-06-10 17:29:11 +02:00
Sandro La Bruzzo 22f65680b9 Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2022-05-11 15:30:12 +02:00
Sandro La Bruzzo ca8d26bcb4 added better filter for openCitations 2022-05-11 15:29:57 +02:00
Claudio Atzori 5d3b4a9c25 [graph merge beta] merge datasource originalid, collectedfrom, and pid lists 2022-05-11 14:13:06 +02:00
Claudio Atzori 378020e30a [eosc_services] unit test adaptation 2022-05-09 16:05:06 +02:00
Claudio Atzori da611cfbbd [eosc_services] resolved merge conflicts 2022-05-03 13:37:15 +02:00
Claudio Atzori b6a7ff3a99 EOSC Services - removed fields from mapping, testing preparation 2022-05-02 15:52:33 +02:00
Claudio Atzori 05c1ea92e9 EOSC Services - added Service-specific fields in the XML record serialization 2022-04-29 15:56:55 +02:00
Claudio Atzori f5f532d134 EOSC Services - ongoing update 2022-04-29 12:25:24 +02:00
Claudio Atzori 5ffc24d1ba EOSC Services - ongoing update 2022-04-26 16:18:41 +02:00
Claudio Atzori 29150a5d0c code formatting 2022-04-21 13:31:56 +02:00
Miriam Baglioni 79336d46c5 [Clean Context] first naive implementation of a functionality to clean not wanted contextes from one result. This implementation simply verifies the main title of the results start with a given string 2022-04-04 15:52:31 +02:00
Claudio Atzori f25407bbe2 added mapping for datasource consent fields to integrate them in the graph 2022-03-11 09:32:42 +01:00
Miriam Baglioni 2c5087d55a [HostedByMap] download of doaj from json, modification of test resources, deletion of class no more needed for the CSV download 2022-03-04 15:18:21 +01:00
Miriam Baglioni aae667e6b6 [APC at the result level] added the APC at the level of the result and modified test class 2022-02-04 12:34:25 +01:00
Alessia Bardi 2e215abfa8 test for instances with URLs for OpenAPC 2022-02-02 17:27:44 +01:00
Claudio Atzori 8eb75ca169 adapted GenerateEntitiesApplicationTest behaviour 2022-01-27 16:24:37 +01:00
Claudio Atzori af61e44acc ported changes to the GraphCleaningFunctionsTest from 8de9788308 2022-01-27 16:19:14 +01:00
Claudio Atzori 4983d6536d Merge branch 'beta' into delegated_authorities 2022-01-21 13:02:48 +01:00
Claudio Atzori f0ea2410e5 improved mapping titles from datacite records to consider title types 2022-01-21 10:50:34 +01:00
Claudio Atzori 3b9020c1b7 added unit test for the DispatchEntitiesJob 2022-01-19 18:15:55 +01:00
Claudio Atzori abfa9c6045 code formatting 2022-01-19 17:17:11 +01:00
Claudio Atzori 391aa1373b added unit test 2022-01-19 17:13:21 +01:00
Claudio Atzori bd59b58efb test for the tolerant deserialisation utility method 2022-01-04 11:26:56 +01:00
Miriam Baglioni 7a1b440413 [SDG] logic to create unresolved entities out of SDG input. This changes also some classes related to FOS to reuse the same code. The code under createunresolvedentities create results with the merged update of the the inputs provided (bip at the level of the isntance, fos and sdg for subjects) 2021-12-23 13:24:28 +01:00
Miriam Baglioni 69e9ea9eeb [Graph Dump] Test for extraction of rels from entities extended 2021-12-23 10:15:30 +01:00
Miriam Baglioni 31b26d48ac [Graph Dump] fixed issue on extraction of relation between entities and contexts: the relationship name and type were swapped 2021-12-23 10:09:47 +01:00
Miriam Baglioni 56409d1281 [Dump] resolved conflicts with beta and merging 2021-12-14 15:03:45 +01:00
Miriam Baglioni 8d755cca80 - 2021-12-13 15:01:40 +01:00
Claudio Atzori 41c70c607d cleaning workflow assigns the proper default instance type when a value could not be cleaned using the vocabularies 2021-12-09 16:44:28 +01:00
Claudio Atzori e6e177dda0 vocabulary based cleaning considers also the term label when looking up for a synonym 2021-12-09 13:57:53 +01:00
Sandro La Bruzzo ed0c352799 [test-fixing] fixed wrong test 2021-12-06 15:07:41 +01:00
Sandro La Bruzzo bf880e2508 [scala-refactor] Module dhp-graph-mapper:
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 13:57:41 +01:00
Miriam Baglioni 4bb1d43afc - 2021-12-03 12:35:51 +01:00
Claudio Atzori 863a2f9db3 avoid to filter OAF records defined as invisible = true 2021-12-03 09:08:12 +01:00
Miriam Baglioni d9f80488cc [GRAPH DUMP] Add one more test to check the filtering of the relations 2021-12-02 14:15:19 +01:00
Miriam Baglioni 58bc3f223a [GRAPH DUMP] Add filtering for relation we do not want to dump. It is based on the relclass 2021-12-02 14:09:46 +01:00
Miriam Baglioni 8905a39bf3 mergin with branch beta 2021-12-02 13:17:29 +01:00
Claudio Atzori d85af6fc25 [cleaning wf] fixed OAF record navigation, a mapping defined on a container object would have prevented the natvigation to continue on its properties 2021-12-01 15:49:15 +01:00
Claudio Atzori 5c6d328537 code formatting 2021-11-26 15:38:16 +01:00
Sandro La Bruzzo 93fe8ce8b2 entity resolution: fix test 2021-11-22 15:50:43 +01:00
Sandro La Bruzzo 35e20b0647 updated resolution wf:
- generate a new version of the graph
 - changed merge from union to join
2021-11-22 11:48:55 +01:00
Miriam Baglioni fdb75b180e [Cleaning] added couple of tests for DOIBOOST publications 2021-11-21 16:35:22 +01:00
Miriam Baglioni 0136a8c266 [Graph Dump] Change test to mirror that measure is at the level of the isntance 2021-11-18 14:38:33 +01:00
Miriam Baglioni 1b79c0ee79 mergin with branch beta 2021-11-18 11:01:00 +01:00
Claudio Atzori e0395719d7 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-11-17 14:17:27 +01:00
Claudio Atzori 82a4e4efae [cleaning wf] fixed methodology to rule out invalid result titles, based on https://support.openaire.eu/issues/7206 2021-11-17 14:17:22 +01:00
Miriam Baglioni 6d4a1c57ee [Resolve Entities] Change test dataset to mirror the modification in the creation of the map between the pids and the unresolved 2021-11-17 12:41:52 +01:00
Miriam Baglioni 99d86134f5 [Graph Dump] changed the dump since the measures have been moded at the level of the instance 2021-11-16 12:04:21 +01:00
Miriam Baglioni 6595135a1a [Dump Schemas] changed the schema of the dumped result according to the modifications in the bestAccessRight type 2021-11-12 11:45:38 +01:00
Miriam Baglioni b3f9370125 merge with beta - resolved conflict in pom 2021-11-12 11:25:26 +01:00
Miriam Baglioni ffb0ce1d59 merge with beta - resolved conflict in pom 2021-11-12 10:19:59 +01:00
Miriam Baglioni b8bdabfae9 [Graph DUmp] removed OpenAccessRoute from test in best access right 2021-11-11 16:16:48 +01:00
Miriam Baglioni e5498052e8 [Graph DUmp] removed OpenAccessRoute from test in best access right 2021-11-11 16:14:10 +01:00
Miriam Baglioni 935062edec [Bypass Action Set] creation of unresolved entities 2021-11-11 16:11:25 +01:00
Sandro La Bruzzo 9cb195314f implemented and tested resolution of entities 2021-11-11 10:17:40 +01:00
Sandro La Bruzzo ae4e99a471 Adapted workflow of resolution of PID to work into OpenAIRE data workflow
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Miriam Baglioni fec40bdd95 merging with branch beta - resolved conflicts 2021-10-12 09:16:36 +02:00