Claudio Atzori
ac8747582c
Merge branch 'beta' into doidoost_dismiss
2024-04-17 12:01:01 +02:00
Giambattista Bloisi
8ac167e420
Refinements to PR #404 : refactoring the Oaf records merge utilities into dhp-common
2024-04-16 17:18:28 +02:00
Miriam Baglioni
9eeb9f5d32
mergin with branch beta
2024-04-16 15:24:40 +02:00
Giambattista Bloisi
da333e9f4d
Merge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' ( #419 ) from dedup_authorsmatch_bytoken into beta
...
Reviewed-on: D-Net/dnet-hadoop#419
2024-04-16 10:24:11 +02:00
Claudio Atzori
d070db4a32
added a couple more invalid author names
2024-04-16 09:41:59 +02:00
Giambattista Bloisi
43b454399f
- Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
...
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Sandro La Bruzzo
843dc95340
resolved conflict
2024-04-11 17:38:16 +02:00
Claudio Atzori
ecff0b4825
merge from beta
2024-03-25 16:15:52 +01:00
Claudio Atzori
82fc609c4f
Merge branch 'beta' into index_records
2024-03-25 16:12:49 +01:00
Claudio Atzori
9fc70a9451
implemented default merge procedure applied to result.instance
2024-03-25 15:39:14 +01:00
Claudio Atzori
aaa73f89d1
refactoring the Oaf records merge utilities into dhp-common
2024-03-22 16:34:03 +01:00
Sandro La Bruzzo
58dbe71d39
update crossref mapping to be runnable separately as a single datasource outside doiboost
2024-03-20 17:04:52 +01:00
Giambattista Bloisi
664a381d31
Unify merge logic of entities in MergeUtils.class
2024-03-18 16:04:49 +01:00
Claudio Atzori
af154d4456
implemented changes from #9497 : sort abstracts by string length, included author fullnames in the related results, expanded instance details within each children/result XML element
2024-03-14 16:21:23 +01:00
Sandro La Bruzzo
c532831718
Moved Crossref Mapping on dhp-aggregations,
...
refactored code, avoid to use utility for create part of the oaf defined in DOIBoostMappingUtils, used instead utility in OafMappingUtils
2024-03-13 06:56:10 +01:00
Claudio Atzori
bb82052c40
[graph cleaning] rule out datasources without an officialname
2024-02-05 14:59:27 +02:00
Claudio Atzori
98cce5bfb2
code formatting
2023-12-12 09:59:05 +01:00
Claudio Atzori
84d54643cf
[cleaning] allow enriched orcids to pass the cleaning, rule out non-orcid author pids
2023-12-12 09:57:00 +01:00
Claudio Atzori
aba95ed1d1
code formatting
2023-12-08 17:06:19 +01:00
Claudio Atzori
34abd0fc43
Merge branch 'beta' into clean_license_publisher
2023-12-08 16:58:27 +01:00
Claudio Atzori
7c3041b276
avoid NPEs
2023-12-03 16:49:49 +01:00
Claudio Atzori
74b185d07b
avoid NPEs
2023-12-03 16:18:20 +01:00
Claudio Atzori
4e1aac2e2f
resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350
2023-11-29 14:37:52 +01:00
Claudio Atzori
1ba582de3c
[graph cleaning] added cleaning for result.publisher and result.instance.license
2023-11-23 16:27:19 +01:00
Claudio Atzori
11a1207f9c
[graph cleaning] applying coar based vocabularies in bulk
2023-11-22 12:22:14 +01:00
Claudio Atzori
dde2fec035
[graph cleaning] cleanup
2023-10-31 14:35:33 +01:00
Claudio Atzori
262d7c581b
[graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898
2023-10-31 14:34:10 +01:00
Claudio Atzori
b0fed1725e
avoid NPEs
2023-10-19 12:13:45 +02:00
Claudio Atzori
39d24d5469
Merge branch 'beta' into resource_types
2023-10-16 11:56:38 +02:00
Claudio Atzori
05ee7d8b09
[graph cleaning] avoid NPEs
2023-10-12 09:13:42 +02:00
Claudio Atzori
554551682d
[raw graph] adopting the new COAR based vocabularies for the resource typing
2023-10-11 16:09:19 +02:00
Claudio Atzori
8108491722
Merge branch 'beta' into peer_reviewed
2023-10-06 14:21:52 +02:00
Giambattista Bloisi
2f3cf6d0e7
Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0' character
2023-10-06 14:20:15 +02:00
Claudio Atzori
c9a5ad6a02
extending the coverage of the peer non-unknown refereed instances
2023-10-02 16:28:42 +02:00
Claudio Atzori
bf35280ea6
code formatting
2023-08-29 11:11:00 +02:00
Miriam Baglioni
c25ac21e5e
Merge pull request 'graph cleaning, suggestions from ticket 8898' ( #325 ) from cleaning_8898 into beta
...
Reviewed-on: D-Net/dnet-hadoop#325
2023-08-08 11:14:19 +02:00
Claudio Atzori
b9dddbfe54
rule out records with NULL dataInfo, except for Relations
2023-07-31 17:53:54 +02:00
Claudio Atzori
11ffb9bd68
rule out records with NULL dataInfo
2023-07-31 12:35:33 +02:00
Claudio Atzori
d8435a6512
inverted condition
2023-07-25 17:39:57 +02:00
Claudio Atzori
270df939c4
partial implementation of the suggestions from https://support.openaire.eu/issues/8898
2023-07-25 17:29:50 +02:00
Claudio Atzori
0f5a819f44
[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests
2023-06-23 16:10:49 +02:00
Claudio Atzori
1d33074fd1
WIP: pid cleaning
2023-06-09 16:47:25 +02:00
Claudio Atzori
2a6ba29b64
[graph cleaning] unit tests & cleanup
2023-04-04 12:34:51 +02:00
Claudio Atzori
6d3d18d8b5
[graph cleaning] WIP: refactoring of the cleaning stages
2023-03-16 17:23:36 +01:00
Claudio Atzori
9a03f71db1
code formatting
2023-02-13 16:25:47 +01:00
Claudio Atzori
9cf0a98699
[cleaning] set the common subject classid/name
2022-12-20 10:17:33 +01:00
Claudio Atzori
b8bafab8a0
[cleaning] improved vocabulary based mapping, specialization for the strict vocab cleaning
2022-12-12 14:43:03 +01:00
Claudio Atzori
b47aaf4dd1
[cleaning] subjects declared as belonging to specific vocabularies whose values are not found in the vocab are set to type keyword
2022-10-13 11:23:43 +02:00
Claudio Atzori
b7c387c21f
cleaning of subjects: avoid duplicated subjects, prioritise collected vs inferred or other sources
2022-08-12 15:09:16 +02:00
Claudio Atzori
3418ce50ac
cleaning of subjects: perform the cleaning when the given value is equivalent to one of the terms in the vocabulary
2022-08-08 12:48:47 +02:00