Sandro La Bruzzo
b84ad0c06e
merged beta
2024-04-19 14:39:59 +02:00
Sandro La Bruzzo
342cb6189b
fixed problem on changed signature on RowEncoder
...
removed property dhp.schema.artifact
2024-04-19 12:13:26 +02:00
Claudio Atzori
ac8747582c
Merge branch 'beta' into doidoost_dismiss
2024-04-17 12:01:01 +02:00
Giambattista Bloisi
8ac167e420
Refinements to PR #404 : refactoring the Oaf records merge utilities into dhp-common
2024-04-16 17:18:28 +02:00
Miriam Baglioni
9eeb9f5d32
mergin with branch beta
2024-04-16 15:24:40 +02:00
Giambattista Bloisi
da333e9f4d
Merge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' ( #419 ) from dedup_authorsmatch_bytoken into beta
...
Reviewed-on: #419
2024-04-16 10:24:11 +02:00
Claudio Atzori
d070db4a32
added a couple more invalid author names
2024-04-16 09:41:59 +02:00
Giambattista Bloisi
43b454399f
- Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
...
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Sandro La Bruzzo
843dc95340
resolved conflict
2024-04-11 17:38:16 +02:00
Sandro La Bruzzo
2581672c11
updated wf of MAG and crossref to use transaction
2024-04-11 17:27:49 +02:00
Claudio Atzori
ecff0b4825
merge from beta
2024-03-25 16:15:52 +01:00
Claudio Atzori
82fc609c4f
Merge branch 'beta' into index_records
2024-03-25 16:12:49 +01:00
Claudio Atzori
9fc70a9451
implemented default merge procedure applied to result.instance
2024-03-25 15:39:14 +01:00
Giambattista Bloisi
3f22c101d9
Merge pull request 'Enrich authors with ORCID info using new matching algorithm' ( #398 ) from new_orcid_enhancement into beta
...
Reviewed-on: #398
2024-03-22 17:29:20 +01:00
Claudio Atzori
aaa73f89d1
refactoring the Oaf records merge utilities into dhp-common
2024-03-22 16:34:03 +01:00
Sandro La Bruzzo
58dbe71d39
update crossref mapping to be runnable separately as a single datasource outside doiboost
2024-03-20 17:04:52 +01:00
Giambattista Bloisi
664a381d31
Unify merge logic of entities in MergeUtils.class
2024-03-18 16:04:49 +01:00
Claudio Atzori
af154d4456
implemented changes from #9497 : sort abstracts by string length, included author fullnames in the related results, expanded instance details within each children/result XML element
2024-03-14 16:21:23 +01:00
Sandro La Bruzzo
c532831718
Moved Crossref Mapping on dhp-aggregations,
...
refactored code, avoid to use utility for create part of the oaf defined in DOIBoostMappingUtils, used instead utility in OafMappingUtils
2024-03-13 06:56:10 +01:00
Giambattista Bloisi
9092075760
Enrich authors with ORCID info using new matching algorithm
2024-03-11 13:23:59 +01:00
Claudio Atzori
bb82052c40
[graph cleaning] rule out datasources without an officialname
2024-02-05 14:59:27 +02:00
Claudio Atzori
9e8fc6aa88
[collection] increased logging from the oai-pmh metadata collection process
2024-01-26 09:17:20 +01:00
Claudio Atzori
3e96777cc4
[collection] increased logging from the oai-pmh metadata collection process
2024-01-23 15:21:03 +01:00
Claudio Atzori
6fd25cf549
code formatting
2024-01-23 08:47:12 +01:00
Claudio Atzori
f76852f385
Merge branch 'beta' into update_pivots_table
2024-01-22 16:37:22 +01:00
Claudio Atzori
1c6db320f4
[graph provision] obtain context info from the context API instead from the ISLookUp service
2024-01-22 15:53:17 +01:00
Giambattista Bloisi
21a14fcd80
Reusable RunSQLSparkJob for executing SQL in Spark through Oozie Spark Actions
...
Implements pivots table update oozie workflow
2024-01-15 10:18:14 +01:00
Claudio Atzori
98cce5bfb2
code formatting
2023-12-12 09:59:05 +01:00
Claudio Atzori
84d54643cf
[cleaning] allow enriched orcids to pass the cleaning, rule out non-orcid author pids
2023-12-12 09:57:00 +01:00
Claudio Atzori
aba95ed1d1
code formatting
2023-12-08 17:06:19 +01:00
Claudio Atzori
34abd0fc43
Merge branch 'beta' into clean_license_publisher
2023-12-08 16:58:27 +01:00
Giambattista Bloisi
613ec5ffce
Add profiles for different spark versions: spark-24, spark-34, spark-35
2023-12-05 19:11:06 +01:00
Giambattista Bloisi
326c9dc08c
Changes in maven poms to build and test the project using Spark 3.4.x and scala 2.12
2023-12-05 19:11:06 +01:00
Claudio Atzori
7c3041b276
avoid NPEs
2023-12-03 16:49:49 +01:00
Claudio Atzori
74b185d07b
avoid NPEs
2023-12-03 16:18:20 +01:00
Claudio Atzori
e6086efc53
avoid NPEs in Vocabulary.getTermBySynonym
2023-12-03 13:33:20 +01:00
Claudio Atzori
d33f578e54
code formatting
2023-12-01 15:14:17 +01:00
Claudio Atzori
622fafbd2e
Merge branch 'beta' into orcid_import
2023-12-01 12:28:14 +01:00
Sandro La Bruzzo
bf0fd27c36
Removed unused function
...
Applied PR Comment of Giambattista in the PR
2023-12-01 12:16:42 +01:00
Sandro La Bruzzo
cdfb7588dd
code formatting
2023-11-30 15:31:42 +01:00
Sandro La Bruzzo
5e22b67b8a
Merge remote-tracking branch 'origin/beta' into orcid_import
2023-11-30 15:27:46 +01:00
Claudio Atzori
4e1aac2e2f
resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350
2023-11-29 14:37:52 +01:00
Sandro La Bruzzo
aa239ec673
Changed implementation of check similarity to verify exact match of name instead of the first char
2023-11-29 11:17:41 +01:00
Sandro La Bruzzo
59111713fa
added comment
2023-11-28 09:00:48 +01:00
Sandro La Bruzzo
6f4d0c05ea
Implemented Author MErger for ORCID that takes in account the case when name and surname are swapped
2023-11-28 08:43:56 +01:00
Sandro La Bruzzo
34a4b3cbdf
Implemented ORCID Enrichment
2023-11-24 12:39:58 +01:00
Claudio Atzori
1ba582de3c
[graph cleaning] added cleaning for result.publisher and result.instance.license
2023-11-23 16:27:19 +01:00
Claudio Atzori
11a1207f9c
[graph cleaning] applying coar based vocabularies in bulk
2023-11-22 12:22:14 +01:00
Claudio Atzori
dde2fec035
[graph cleaning] cleanup
2023-10-31 14:35:33 +01:00
Claudio Atzori
262d7c581b
[graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898
2023-10-31 14:34:10 +01:00