Commit Graph

116 Commits

Author SHA1 Message Date
Claudio Atzori e2937db385 Merge branch 'beta' into misc_fixes_merge_entities 2024-04-24 08:55:28 +02:00
Giambattista Bloisi 1878199dae Miscellaneous fixes:
- in Merge By ID pick by preference those records coming from delegated Authorities
- fix various tests
- close spark session in SparkCreateSimRels
2024-04-24 08:12:45 +02:00
Claudio Atzori 24a83fc24f avoid NPEs in common Oaf merge utilities 2024-04-22 11:39:44 +02:00
Claudio Atzori ac8747582c Merge branch 'beta' into doidoost_dismiss 2024-04-17 12:01:01 +02:00
Giambattista Bloisi 8ac167e420 Refinements to PR #404: refactoring the Oaf records merge utilities into dhp-common 2024-04-16 17:18:28 +02:00
Miriam Baglioni 9eeb9f5d32 mergin with branch beta 2024-04-16 15:24:40 +02:00
Giambattista Bloisi da333e9f4d Merge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' (#419) from dedup_authorsmatch_bytoken into beta
Reviewed-on: #419
2024-04-16 10:24:11 +02:00
Claudio Atzori d070db4a32 added a couple more invalid author names 2024-04-16 09:41:59 +02:00
Giambattista Bloisi 43b454399f - Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Sandro La Bruzzo 843dc95340 resolved conflict 2024-04-11 17:38:16 +02:00
Claudio Atzori ecff0b4825 merge from beta 2024-03-25 16:15:52 +01:00
Claudio Atzori 82fc609c4f Merge branch 'beta' into index_records 2024-03-25 16:12:49 +01:00
Claudio Atzori 9fc70a9451 implemented default merge procedure applied to result.instance 2024-03-25 15:39:14 +01:00
Claudio Atzori aaa73f89d1 refactoring the Oaf records merge utilities into dhp-common 2024-03-22 16:34:03 +01:00
Sandro La Bruzzo 58dbe71d39 update crossref mapping to be runnable separately as a single datasource outside doiboost 2024-03-20 17:04:52 +01:00
Giambattista Bloisi 664a381d31 Unify merge logic of entities in MergeUtils.class 2024-03-18 16:04:49 +01:00
Claudio Atzori af154d4456 implemented changes from #9497: sort abstracts by string length, included author fullnames in the related results, expanded instance details within each children/result XML element 2024-03-14 16:21:23 +01:00
Sandro La Bruzzo c532831718 Moved Crossref Mapping on dhp-aggregations,
refactored code, avoid to use utility for create part of the oaf defined in DOIBoostMappingUtils, used instead utility in OafMappingUtils
2024-03-13 06:56:10 +01:00
Claudio Atzori bb82052c40 [graph cleaning] rule out datasources without an officialname 2024-02-05 14:59:27 +02:00
Claudio Atzori 98cce5bfb2 code formatting 2023-12-12 09:59:05 +01:00
Claudio Atzori 84d54643cf [cleaning] allow enriched orcids to pass the cleaning, rule out non-orcid author pids 2023-12-12 09:57:00 +01:00
Claudio Atzori aba95ed1d1 code formatting 2023-12-08 17:06:19 +01:00
Claudio Atzori 34abd0fc43 Merge branch 'beta' into clean_license_publisher 2023-12-08 16:58:27 +01:00
Claudio Atzori 7c3041b276 avoid NPEs 2023-12-03 16:49:49 +01:00
Claudio Atzori 74b185d07b avoid NPEs 2023-12-03 16:18:20 +01:00
Claudio Atzori 4e1aac2e2f resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350 2023-11-29 14:37:52 +01:00
Claudio Atzori 1ba582de3c [graph cleaning] added cleaning for result.publisher and result.instance.license 2023-11-23 16:27:19 +01:00
Claudio Atzori 11a1207f9c [graph cleaning] applying coar based vocabularies in bulk 2023-11-22 12:22:14 +01:00
Claudio Atzori dde2fec035 [graph cleaning] cleanup 2023-10-31 14:35:33 +01:00
Claudio Atzori 262d7c581b [graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898 2023-10-31 14:34:10 +01:00
Claudio Atzori b0fed1725e avoid NPEs 2023-10-19 12:13:45 +02:00
Claudio Atzori 39d24d5469 Merge branch 'beta' into resource_types 2023-10-16 11:56:38 +02:00
Claudio Atzori 05ee7d8b09 [graph cleaning] avoid NPEs 2023-10-12 09:13:42 +02:00
Claudio Atzori 554551682d [raw graph] adopting the new COAR based vocabularies for the resource typing 2023-10-11 16:09:19 +02:00
Claudio Atzori 8108491722 Merge branch 'beta' into peer_reviewed 2023-10-06 14:21:52 +02:00
Giambattista Bloisi 2f3cf6d0e7 Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0' character 2023-10-06 14:20:15 +02:00
Claudio Atzori c9a5ad6a02 extending the coverage of the peer non-unknown refereed instances 2023-10-02 16:28:42 +02:00
Claudio Atzori bf35280ea6 code formatting 2023-08-29 11:11:00 +02:00
Miriam Baglioni c25ac21e5e Merge pull request 'graph cleaning, suggestions from ticket 8898' (#325) from cleaning_8898 into beta
Reviewed-on: #325
2023-08-08 11:14:19 +02:00
Claudio Atzori b9dddbfe54 rule out records with NULL dataInfo, except for Relations 2023-07-31 17:53:54 +02:00
Claudio Atzori 11ffb9bd68 rule out records with NULL dataInfo 2023-07-31 12:35:33 +02:00
Claudio Atzori d8435a6512 inverted condition 2023-07-25 17:39:57 +02:00
Claudio Atzori 270df939c4 partial implementation of the suggestions from https://support.openaire.eu/issues/8898 2023-07-25 17:29:50 +02:00
Claudio Atzori 0f5a819f44 [graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests 2023-06-23 16:10:49 +02:00
Claudio Atzori 1d33074fd1 WIP: pid cleaning 2023-06-09 16:47:25 +02:00
Claudio Atzori 2a6ba29b64 [graph cleaning] unit tests & cleanup 2023-04-04 12:34:51 +02:00
Claudio Atzori 6d3d18d8b5 [graph cleaning] WIP: refactoring of the cleaning stages 2023-03-16 17:23:36 +01:00
Claudio Atzori 9a03f71db1 code formatting 2023-02-13 16:25:47 +01:00
Claudio Atzori 9cf0a98699 [cleaning] set the common subject classid/name 2022-12-20 10:17:33 +01:00
Claudio Atzori b8bafab8a0 [cleaning] improved vocabulary based mapping, specialization for the strict vocab cleaning 2022-12-12 14:43:03 +01:00