Commit Graph

466 Commits

Author SHA1 Message Date
Claudio Atzori 8e45c5baa8 graph cleaning to implement ugly hardcoded rules 2024-05-28 15:28:42 +02:00
Claudio Atzori c3fe59bc78 fixed conflicts merging from beta, code formatting 2024-05-21 14:50:40 +02:00
Claudio Atzori a5d13d5d27 code formatting 2024-05-03 14:14:34 +02:00
Giambattista Bloisi 69c5efbd8b Fix: when applying enrichments with no instance information the resulting merge entity was generated with no instance instead of keeping the original information 2024-05-03 13:57:56 +02:00
Claudio Atzori 4355f64810 reverted to version 1.2.5-SNAPSHOT 2024-05-02 11:23:53 +02:00
Claudio Atzori 66680b8b9a refactoring of common utilities 2024-05-02 11:16:58 +02:00
Claudio Atzori dcf23b3d06 Merge branch 'beta' into beta-release-1.2.5 2024-05-02 10:01:49 +02:00
Claudio Atzori e2937db385 Merge branch 'beta' into misc_fixes_merge_entities 2024-04-24 08:55:28 +02:00
Giambattista Bloisi 1878199dae Miscellaneous fixes:
- in Merge By ID pick by preference those records coming from delegated Authorities
- fix various tests
- close spark session in SparkCreateSimRels
2024-04-24 08:12:45 +02:00
Claudio Atzori c3053ef34d using version 1.2.5-beta for the release 2024-04-23 14:52:32 +02:00
Claudio Atzori b5bcab13ec using version 1.2.5-beta for the release 2024-04-23 14:36:39 +02:00
Claudio Atzori 425c9afc36 using version 1.2.5-beta for the release 2024-04-23 14:30:04 +02:00
Claudio Atzori 24a83fc24f avoid NPEs in common Oaf merge utilities 2024-04-22 11:39:44 +02:00
Claudio Atzori 5857fd38c1 avoid NPEs in common Oaf merge utilities 2024-04-21 08:29:09 +02:00
Claudio Atzori ac8747582c Merge branch 'beta' into doidoost_dismiss 2024-04-17 12:01:01 +02:00
Giambattista Bloisi 8ac167e420 Refinements to PR #404: refactoring the Oaf records merge utilities into dhp-common 2024-04-16 17:18:28 +02:00
Miriam Baglioni 9eeb9f5d32 mergin with branch beta 2024-04-16 15:24:40 +02:00
Giambattista Bloisi da333e9f4d Merge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' (#419) from dedup_authorsmatch_bytoken into beta
Reviewed-on: #419
2024-04-16 10:24:11 +02:00
Claudio Atzori d070db4a32 added a couple more invalid author names 2024-04-16 09:41:59 +02:00
Giambattista Bloisi 43b454399f - Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Sandro La Bruzzo 843dc95340 resolved conflict 2024-04-11 17:38:16 +02:00
Sandro La Bruzzo 2581672c11 updated wf of MAG and crossref to use transaction 2024-04-11 17:27:49 +02:00
Claudio Atzori ecff0b4825 merge from beta 2024-03-25 16:15:52 +01:00
Claudio Atzori 82fc609c4f Merge branch 'beta' into index_records 2024-03-25 16:12:49 +01:00
Claudio Atzori 9fc70a9451 implemented default merge procedure applied to result.instance 2024-03-25 15:39:14 +01:00
Giambattista Bloisi 3f22c101d9 Merge pull request 'Enrich authors with ORCID info using new matching algorithm' (#398) from new_orcid_enhancement into beta
Reviewed-on: #398
2024-03-22 17:29:20 +01:00
Claudio Atzori aaa73f89d1 refactoring the Oaf records merge utilities into dhp-common 2024-03-22 16:34:03 +01:00
Sandro La Bruzzo 58dbe71d39 update crossref mapping to be runnable separately as a single datasource outside doiboost 2024-03-20 17:04:52 +01:00
Giambattista Bloisi 664a381d31 Unify merge logic of entities in MergeUtils.class 2024-03-18 16:04:49 +01:00
Claudio Atzori af154d4456 implemented changes from #9497: sort abstracts by string length, included author fullnames in the related results, expanded instance details within each children/result XML element 2024-03-14 16:21:23 +01:00
Sandro La Bruzzo c532831718 Moved Crossref Mapping on dhp-aggregations,
refactored code, avoid to use utility for create part of the oaf defined in DOIBoostMappingUtils, used instead utility in OafMappingUtils
2024-03-13 06:56:10 +01:00
Giambattista Bloisi 9092075760 Enrich authors with ORCID info using new matching algorithm 2024-03-11 13:23:59 +01:00
Claudio Atzori bb82052c40 [graph cleaning] rule out datasources without an officialname 2024-02-05 14:59:27 +02:00
Claudio Atzori e8630a6d03 [graph cleaning] rule out datasources without an officialname 2024-02-05 14:59:06 +02:00
Claudio Atzori 4d0c59669b merged changes from beta 2024-01-26 16:08:54 +01:00
Claudio Atzori 9e8fc6aa88 [collection] increased logging from the oai-pmh metadata collection process 2024-01-26 09:17:20 +01:00
Claudio Atzori 3e96777cc4 [collection] increased logging from the oai-pmh metadata collection process 2024-01-23 15:21:03 +01:00
Claudio Atzori 6fd25cf549 code formatting 2024-01-23 08:47:12 +01:00
Claudio Atzori f76852f385 Merge branch 'beta' into update_pivots_table 2024-01-22 16:37:22 +01:00
Claudio Atzori 1c6db320f4 [graph provision] obtain context info from the context API instead from the ISLookUp service 2024-01-22 15:53:17 +01:00
Giambattista Bloisi 21a14fcd80 Reusable RunSQLSparkJob for executing SQL in Spark through Oozie Spark Actions
Implements pivots table update oozie workflow
2024-01-15 10:18:14 +01:00
Claudio Atzori 1726f49790 code formatting 2023-12-15 10:37:02 +01:00
Claudio Atzori 98cce5bfb2 code formatting 2023-12-12 09:59:05 +01:00
Claudio Atzori 84d54643cf [cleaning] allow enriched orcids to pass the cleaning, rule out non-orcid author pids 2023-12-12 09:57:00 +01:00
Claudio Atzori aba95ed1d1 code formatting 2023-12-08 17:06:19 +01:00
Claudio Atzori 34abd0fc43 Merge branch 'beta' into clean_license_publisher 2023-12-08 16:58:27 +01:00
Claudio Atzori 7c3041b276 avoid NPEs 2023-12-03 16:49:49 +01:00
Claudio Atzori 74b185d07b avoid NPEs 2023-12-03 16:18:20 +01:00
Claudio Atzori e6086efc53 avoid NPEs in Vocabulary.getTermBySynonym 2023-12-03 13:33:20 +01:00
Claudio Atzori d33f578e54 code formatting 2023-12-01 15:14:17 +01:00