1
0
Fork 0
Commit Graph

473 Commits

Author SHA1 Message Date
Claudio Atzori 153b56eeff make entity level pids unique by pidType:pidValue 2024-07-04 09:41:39 +02:00
Claudio Atzori 7b398a6d0b updated import of organization types from OpenOrgs 2024-07-03 11:11:35 +02:00
Claudio Atzori c06dfdfd86 ignore dates containing 'null's 2024-07-02 15:43:11 +02:00
Claudio Atzori 71927ca818 avoid NPEs 2024-06-11 12:40:50 +02:00
Giambattista Bloisi 46018dc804 Fix OperationUnsupportedException while merging two Result's contexts due to modification of an immutable collection 2024-06-11 10:39:48 +02:00
Giambattista Bloisi 3feab5d92d Fix MergeUtils.mergeGroup: it could get rid of some records and did not consider all PID authorities whilke sorting records.
ResultTypeComparator is now renamed in MergeEntitiesComparator and can be used as a general comparator for merging groups of records
2024-06-03 15:13:40 +02:00
Claudio Atzori a428e7be7e graph cleaning to implement ugly hardcoded rules, avoid NPEs 2024-05-29 09:26:12 +02:00
Claudio Atzori 8e45c5baa8 graph cleaning to implement ugly hardcoded rules 2024-05-28 15:28:42 +02:00
Claudio Atzori c3fe59bc78 fixed conflicts merging from beta, code formatting 2024-05-21 14:50:40 +02:00
Claudio Atzori a5d13d5d27 code formatting 2024-05-03 14:14:34 +02:00
Giambattista Bloisi 69c5efbd8b Fix: when applying enrichments with no instance information the resulting merge entity was generated with no instance instead of keeping the original information 2024-05-03 13:57:56 +02:00
Claudio Atzori 4355f64810 reverted to version 1.2.5-SNAPSHOT 2024-05-02 11:23:53 +02:00
Claudio Atzori 66680b8b9a refactoring of common utilities 2024-05-02 11:16:58 +02:00
Claudio Atzori dcf23b3d06 Merge branch 'beta' into beta-release-1.2.5 2024-05-02 10:01:49 +02:00
Claudio Atzori e2937db385 Merge branch 'beta' into misc_fixes_merge_entities 2024-04-24 08:55:28 +02:00
Giambattista Bloisi 1878199dae Miscellaneous fixes:
- in Merge By ID pick by preference those records coming from delegated Authorities
- fix various tests
- close spark session in SparkCreateSimRels
2024-04-24 08:12:45 +02:00
Claudio Atzori c3053ef34d using version 1.2.5-beta for the release 2024-04-23 14:52:32 +02:00
Claudio Atzori b5bcab13ec using version 1.2.5-beta for the release 2024-04-23 14:36:39 +02:00
Claudio Atzori 425c9afc36 using version 1.2.5-beta for the release 2024-04-23 14:30:04 +02:00
Claudio Atzori 24a83fc24f avoid NPEs in common Oaf merge utilities 2024-04-22 11:39:44 +02:00
Claudio Atzori 5857fd38c1 avoid NPEs in common Oaf merge utilities 2024-04-21 08:29:09 +02:00
Claudio Atzori ac8747582c Merge branch 'beta' into doidoost_dismiss 2024-04-17 12:01:01 +02:00
Giambattista Bloisi 8ac167e420 Refinements to PR #404: refactoring the Oaf records merge utilities into dhp-common 2024-04-16 17:18:28 +02:00
Miriam Baglioni 9eeb9f5d32 mergin with branch beta 2024-04-16 15:24:40 +02:00
Giambattista Bloisi da333e9f4d Merge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' (#419) from dedup_authorsmatch_bytoken into beta
Reviewed-on: D-Net/dnet-hadoop#419
2024-04-16 10:24:11 +02:00
Claudio Atzori d070db4a32 added a couple more invalid author names 2024-04-16 09:41:59 +02:00
Giambattista Bloisi 43b454399f - Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Sandro La Bruzzo 843dc95340 resolved conflict 2024-04-11 17:38:16 +02:00
Sandro La Bruzzo 2581672c11 updated wf of MAG and crossref to use transaction 2024-04-11 17:27:49 +02:00
Claudio Atzori ecff0b4825 merge from beta 2024-03-25 16:15:52 +01:00
Claudio Atzori 82fc609c4f Merge branch 'beta' into index_records 2024-03-25 16:12:49 +01:00
Claudio Atzori 9fc70a9451 implemented default merge procedure applied to result.instance 2024-03-25 15:39:14 +01:00
Giambattista Bloisi 3f22c101d9 Merge pull request 'Enrich authors with ORCID info using new matching algorithm' (#398) from new_orcid_enhancement into beta
Reviewed-on: D-Net/dnet-hadoop#398
2024-03-22 17:29:20 +01:00
Claudio Atzori aaa73f89d1 refactoring the Oaf records merge utilities into dhp-common 2024-03-22 16:34:03 +01:00
Sandro La Bruzzo 58dbe71d39 update crossref mapping to be runnable separately as a single datasource outside doiboost 2024-03-20 17:04:52 +01:00
Giambattista Bloisi 664a381d31 Unify merge logic of entities in MergeUtils.class 2024-03-18 16:04:49 +01:00
Claudio Atzori af154d4456 implemented changes from #9497: sort abstracts by string length, included author fullnames in the related results, expanded instance details within each children/result XML element 2024-03-14 16:21:23 +01:00
Sandro La Bruzzo c532831718 Moved Crossref Mapping on dhp-aggregations,
refactored code, avoid to use utility for create part of the oaf defined in DOIBoostMappingUtils, used instead utility in OafMappingUtils
2024-03-13 06:56:10 +01:00
Giambattista Bloisi 9092075760 Enrich authors with ORCID info using new matching algorithm 2024-03-11 13:23:59 +01:00
Claudio Atzori bb82052c40 [graph cleaning] rule out datasources without an officialname 2024-02-05 14:59:27 +02:00
Claudio Atzori e8630a6d03 [graph cleaning] rule out datasources without an officialname 2024-02-05 14:59:06 +02:00
Claudio Atzori 4d0c59669b merged changes from beta 2024-01-26 16:08:54 +01:00
Claudio Atzori 9e8fc6aa88 [collection] increased logging from the oai-pmh metadata collection process 2024-01-26 09:17:20 +01:00
Claudio Atzori 3e96777cc4 [collection] increased logging from the oai-pmh metadata collection process 2024-01-23 15:21:03 +01:00
Claudio Atzori 6fd25cf549 code formatting 2024-01-23 08:47:12 +01:00
Claudio Atzori f76852f385 Merge branch 'beta' into update_pivots_table 2024-01-22 16:37:22 +01:00
Claudio Atzori 1c6db320f4 [graph provision] obtain context info from the context API instead from the ISLookUp service 2024-01-22 15:53:17 +01:00
Giambattista Bloisi 21a14fcd80 Reusable RunSQLSparkJob for executing SQL in Spark through Oozie Spark Actions
Implements pivots table update oozie workflow
2024-01-15 10:18:14 +01:00
Claudio Atzori 1726f49790 code formatting 2023-12-15 10:37:02 +01:00
Claudio Atzori 98cce5bfb2 code formatting 2023-12-12 09:59:05 +01:00