Giambattista Bloisi
43b454399f
- Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
...
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Claudio Atzori
ef52128c55
included new stats* workflows in parent pom list of modules, code formatting
2024-03-26 10:42:10 +01:00
Claudio Atzori
bfba71a95c
further follow up changes from integrating the mergeutils branch
2024-03-26 09:01:18 +01:00
Claudio Atzori
538b180fe0
Merge branch 'beta' into oaf_country_beta
2024-03-25 16:13:20 +01:00
Giambattista Bloisi
3f22c101d9
Merge pull request 'Enrich authors with ORCID info using new matching algorithm' ( #398 ) from new_orcid_enhancement into beta
...
Reviewed-on: #398
2024-03-22 17:29:20 +01:00
Giambattista Bloisi
664a381d31
Unify merge logic of entities in MergeUtils.class
2024-03-18 16:04:49 +01:00
Michele Artini
30167aa882
mapped oaf:country from results
2024-03-15 11:24:16 +01:00
Giambattista Bloisi
9092075760
Enrich authors with ORCID info using new matching algorithm
2024-03-11 13:23:59 +01:00
Sandro La Bruzzo
7d806a434c
formatted code
2024-02-28 09:31:58 +01:00
Claudio Atzori
a63b091bae
Merge branch 'beta' into import_orps_fix
2024-02-15 15:01:56 +01:00
Claudio Atzori
d85d2df6ad
[graph raw] fixed mapping of the original resource type from the Datacite format
2024-02-09 10:20:20 +01:00
Claudio Atzori
38c9001147
fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite)
2024-02-07 17:02:05 +01:00
Alessia Bardi
f2a08d8cc2
test for Italian records from IRS repositories
2024-01-30 19:20:14 +01:00
Claudio Atzori
cb71a7936b
[graph cleaning] avoid stack overflow error when navigating Oaf objects declaring an Enum
2023-12-07 23:09:54 +01:00
Claudio Atzori
622fafbd2e
Merge branch 'beta' into orcid_import
2023-12-01 12:28:14 +01:00
Sandro La Bruzzo
cdfb7588dd
code formatting
2023-11-30 15:31:42 +01:00
Sandro La Bruzzo
5e22b67b8a
Merge remote-tracking branch 'origin/beta' into orcid_import
2023-11-30 15:27:46 +01:00
Sandro La Bruzzo
7b5e04f37e
removed Orcid intersection on DOIBoost
2023-11-30 14:36:50 +01:00
Claudio Atzori
4e1aac2e2f
resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350
2023-11-29 14:37:52 +01:00
Sandro La Bruzzo
279100fa52
added test
2023-11-29 11:17:58 +01:00
Sandro La Bruzzo
34a4b3cbdf
Implemented ORCID Enrichment
2023-11-24 12:39:58 +01:00
Claudio Atzori
2c77638bf5
Merge branch 'beta' into cleaning_8898
2023-11-22 14:00:10 +01:00
Claudio Atzori
11a1207f9c
[graph cleaning] applying coar based vocabularies in bulk
2023-11-22 12:22:14 +01:00
Claudio Atzori
262d7c581b
[graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898
2023-10-31 14:34:10 +01:00
Claudio Atzori
2b9d0416ec
[graph raw] URL Validator to accept double slashes
2023-10-19 16:26:37 +02:00
Claudio Atzori
6dfcd0c9a2
[raw graph] mapping original resource types
2023-10-16 12:57:18 +02:00
Claudio Atzori
54fbf09ac6
[raw graph] WIP: mapping original resource types
2023-10-16 08:57:47 +02:00
Claudio Atzori
554551682d
[raw graph] adopting the new COAR based vocabularies for the resource typing
2023-10-11 16:09:19 +02:00
Claudio Atzori
eed9fe0902
code formatting
2023-10-06 12:31:17 +02:00
Claudio Atzori
dc86018a5f
Merge branch 'merge_entities_job' into beta
2023-10-02 11:24:48 +02:00
Alessia Bardi
0935d7757c
Use v5 of the UNIBI Gold ISSN list in test
2023-09-20 15:41:35 +02:00
Alessia Bardi
cc7204a089
tests for d4science catalog
2023-09-20 15:38:32 +02:00
Giambattista Bloisi
6cc7d8ca7b
GroupEntities and DispatchEntites are now merged in GroupEntitiesSparkJob
2023-08-30 10:43:31 +02:00
Claudio Atzori
bf35280ea6
code formatting
2023-08-29 11:11:00 +02:00
Giambattista Bloisi
fab9920271
DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
2023-08-09 15:41:43 +02:00
Miriam Baglioni
c25ac21e5e
Merge pull request 'graph cleaning, suggestions from ticket 8898' ( #325 ) from cleaning_8898 into beta
...
Reviewed-on: #325
2023-08-08 11:14:19 +02:00
Claudio Atzori
11ffb9bd68
rule out records with NULL dataInfo
2023-07-31 12:35:33 +02:00
Claudio Atzori
270df939c4
partial implementation of the suggestions from https://support.openaire.eu/issues/8898
2023-07-25 17:29:50 +02:00
Claudio Atzori
e45777e7e1
[aggregator graph] added validation for URLs mapped from oaf:fulltext
2023-05-26 11:33:42 +02:00
Claudio Atzori
8a463cc3e8
fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common
2023-05-15 15:44:46 +02:00
Claudio Atzori
d8882c4481
extended mapping applied to datacite records to produce affiliations using the ROR ids. Inc ase of APCs it includes the amount and the currently in the relation
2023-05-02 11:56:51 +02:00
Claudio Atzori
dead87917f
[graph cleaning] cleanup
2023-04-04 13:13:43 +02:00
Claudio Atzori
2a6ba29b64
[graph cleaning] unit tests & cleanup
2023-04-04 12:34:51 +02:00
Claudio Atzori
c07857fa37
[graph cleaning] unit tests & cleanup
2023-03-23 15:57:47 +01:00
Claudio Atzori
90e61a8aba
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-23 15:03:26 +01:00
Claudio Atzori
488d9a5eaa
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-23 10:41:13 +01:00
Claudio Atzori
4f5ba0ed52
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-21 14:41:20 +01:00
Claudio Atzori
9a03f71db1
code formatting
2023-02-13 16:25:47 +01:00
Miriam Baglioni
d6895f0387
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-09 17:28:38 +01:00
Sandro La Bruzzo
3c9826f186
updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline
2022-12-21 11:21:17 +01:00