Commit Graph

1378 Commits

Author SHA1 Message Date
Giambattista Bloisi 43b454399f - Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Claudio Atzori ef52128c55 included new stats* workflows in parent pom list of modules, code formatting 2024-03-26 10:42:10 +01:00
Claudio Atzori bfba71a95c further follow up changes from integrating the mergeutils branch 2024-03-26 09:01:18 +01:00
Claudio Atzori 538b180fe0 Merge branch 'beta' into oaf_country_beta 2024-03-25 16:13:20 +01:00
Giambattista Bloisi 3f22c101d9 Merge pull request 'Enrich authors with ORCID info using new matching algorithm' (#398) from new_orcid_enhancement into beta
Reviewed-on: #398
2024-03-22 17:29:20 +01:00
Giambattista Bloisi 0ff7faad72 Fix conditions that prevented ORCID Enrichment 2024-03-22 16:24:49 +01:00
Giambattista Bloisi 664a381d31 Unify merge logic of entities in MergeUtils.class 2024-03-18 16:04:49 +01:00
Michele Artini 30167aa882 mapped oaf:country from results 2024-03-15 11:24:16 +01:00
Giambattista Bloisi 9092075760 Enrich authors with ORCID info using new matching algorithm 2024-03-11 13:23:59 +01:00
Sandro La Bruzzo 7d806a434c formatted code 2024-02-28 09:31:58 +01:00
Michele Artini 3268570b2c mapping of project PIDs 2024-02-22 14:47:21 +01:00
Claudio Atzori a63b091bae Merge branch 'beta' into import_orps_fix 2024-02-15 15:01:56 +01:00
Claudio Atzori d85d2df6ad [graph raw] fixed mapping of the original resource type from the Datacite format 2024-02-09 10:20:20 +01:00
Claudio Atzori 38c9001147 fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite) 2024-02-07 17:02:05 +01:00
Claudio Atzori 42f5506306 [orcid enrichment] fixed directory cleanup before distcp 2024-02-05 09:45:36 +02:00
Alessia Bardi f2a08d8cc2 test for Italian records from IRS repositories 2024-01-30 19:20:14 +01:00
Claudio Atzori 2655eea5bc [orcid enrichment] drop paths before copying the non-modifyed contents 2024-01-19 16:28:05 +01:00
Claudio Atzori cb9e739484 Merge branch 'beta' into resource_types 2024-01-11 16:29:41 +01:00
Claudio Atzori 2753044d13 refined mapping for the extraction of the original resource type 2024-01-11 16:28:26 +01:00
Miriam Baglioni e711a05229 fixed conflicts 2024-01-10 11:03:42 +01:00
Claudio Atzori 62104790ae added metaresourcetype to the result hive DB view 2023-12-21 12:27:10 +01:00
Miriam Baglioni 4740c808f7 - 2023-12-20 14:26:54 +01:00
Claudio Atzori cb71a7936b [graph cleaning] avoid stack overflow error when navigating Oaf objects declaring an Enum 2023-12-07 23:09:54 +01:00
Claudio Atzori 259c69e446 [orcid enrichment] fixed workflow definition 2023-12-06 19:41:53 +01:00
Claudio Atzori 2a233a89aa [graph grouping] added isLookupUrl to the workflow definition, passed to the grouping spark aciton 2023-12-03 13:32:52 +01:00
Claudio Atzori 622fafbd2e Merge branch 'beta' into orcid_import 2023-12-01 12:28:14 +01:00
Sandro La Bruzzo bf0fd27c36 Removed unused function
Applied PR Comment of Giambattista in the PR
2023-12-01 12:16:42 +01:00
Sandro La Bruzzo cdfb7588dd code formatting 2023-11-30 15:31:42 +01:00
Sandro La Bruzzo 5e22b67b8a Merge remote-tracking branch 'origin/beta' into orcid_import 2023-11-30 15:27:46 +01:00
Sandro La Bruzzo f718caaac9 Added copy of the untouched entities of the graph 2023-11-30 14:51:00 +01:00
Sandro La Bruzzo 7b5e04f37e removed Orcid intersection on DOIBoost 2023-11-30 14:36:50 +01:00
Claudio Atzori 4e1aac2e2f resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350 2023-11-29 14:37:52 +01:00
Sandro La Bruzzo 279100fa52 added test 2023-11-29 11:17:58 +01:00
Sandro La Bruzzo 59111713fa added comment 2023-11-28 09:00:48 +01:00
Sandro La Bruzzo 6f4d0c05ea Implemented Author MErger for ORCID that takes in account the case when name and surname are swapped 2023-11-28 08:43:56 +01:00
Sandro La Bruzzo 34a4b3cbdf Implemented ORCID Enrichment 2023-11-24 12:39:58 +01:00
Claudio Atzori 2c77638bf5 Merge branch 'beta' into cleaning_8898 2023-11-22 14:00:10 +01:00
Claudio Atzori 11a1207f9c [graph cleaning] applying coar based vocabularies in bulk 2023-11-22 12:22:14 +01:00
Claudio Atzori 262d7c581b [graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898 2023-10-31 14:34:10 +01:00
Claudio Atzori b3a61ea955 Merge branch 'beta' into url_validation 2023-10-25 14:22:56 +02:00
Claudio Atzori 7fc621cdec added defaults to the graph resolution workflow config-default.xml 2023-10-20 22:28:12 +02:00
Claudio Atzori 2b9d0416ec [graph raw] URL Validator to accept double slashes 2023-10-19 16:26:37 +02:00
Claudio Atzori 6dfcd0c9a2 [raw graph] mapping original resource types 2023-10-16 12:57:18 +02:00
Claudio Atzori 54fbf09ac6 [raw graph] WIP: mapping original resource types 2023-10-16 08:57:47 +02:00
Claudio Atzori 554551682d [raw graph] adopting the new COAR based vocabularies for the resource typing 2023-10-11 16:09:19 +02:00
Claudio Atzori eed9fe0902 code formatting 2023-10-06 12:31:17 +02:00
Claudio Atzori dc86018a5f Merge branch 'merge_entities_job' into beta 2023-10-02 11:24:48 +02:00
Alessia Bardi 0935d7757c Use v5 of the UNIBI Gold ISSN list in test 2023-09-20 15:41:35 +02:00
Alessia Bardi cc7204a089 tests for d4science catalog 2023-09-20 15:38:32 +02:00
Claudio Atzori 5b06c9d06f [graph raw] datainfo.invisible set as true only for entities 2023-09-04 15:15:24 +02:00