Sandro La Bruzzo
0646d0d064
Updated main sparkApplication to avoid to require master variable
2024-05-02 15:15:03 +02:00
Sandro La Bruzzo
133ead1e3e
updated new version of scholexplorer Generation
2024-04-29 09:00:30 +02:00
Sandro La Bruzzo
9cd3bc0f10
Added a new generation of the dump for scholexplorer tested with last version of spark, and strongly refactored
2024-04-26 16:02:07 +02:00
Claudio Atzori
0656ab2838
code formatting
2024-04-20 08:10:58 +02:00
Claudio Atzori
ab7f0855af
fixed query reading projects from the aggregator DB
2024-04-20 08:10:32 +02:00
Giambattista Bloisi
8ac167e420
Refinements to PR #404 : refactoring the Oaf records merge utilities into dhp-common
2024-04-16 17:18:28 +02:00
Giambattista Bloisi
43b454399f
- Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
...
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Claudio Atzori
ef52128c55
included new stats* workflows in parent pom list of modules, code formatting
2024-03-26 10:42:10 +01:00
Claudio Atzori
bfba71a95c
further follow up changes from integrating the mergeutils branch
2024-03-26 09:01:18 +01:00
Claudio Atzori
538b180fe0
Merge branch 'beta' into oaf_country_beta
2024-03-25 16:13:20 +01:00
Giambattista Bloisi
3f22c101d9
Merge pull request 'Enrich authors with ORCID info using new matching algorithm' ( #398 ) from new_orcid_enhancement into beta
...
Reviewed-on: D-Net/dnet-hadoop#398
2024-03-22 17:29:20 +01:00
Giambattista Bloisi
0ff7faad72
Fix conditions that prevented ORCID Enrichment
2024-03-22 16:24:49 +01:00
Giambattista Bloisi
664a381d31
Unify merge logic of entities in MergeUtils.class
2024-03-18 16:04:49 +01:00
Michele Artini
30167aa882
mapped oaf:country from results
2024-03-15 11:24:16 +01:00
Giambattista Bloisi
9092075760
Enrich authors with ORCID info using new matching algorithm
2024-03-11 13:23:59 +01:00
Sandro La Bruzzo
7d806a434c
formatted code
2024-02-28 09:31:58 +01:00
Michele Artini
3268570b2c
mapping of project PIDs
2024-02-22 14:47:21 +01:00
Claudio Atzori
a63b091bae
Merge branch 'beta' into import_orps_fix
2024-02-15 15:01:56 +01:00
Claudio Atzori
d85d2df6ad
[graph raw] fixed mapping of the original resource type from the Datacite format
2024-02-09 10:20:20 +01:00
Claudio Atzori
38c9001147
fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite)
2024-02-07 17:02:05 +01:00
Claudio Atzori
42f5506306
[orcid enrichment] fixed directory cleanup before distcp
2024-02-05 09:45:36 +02:00
Alessia Bardi
f2a08d8cc2
test for Italian records from IRS repositories
2024-01-30 19:20:14 +01:00
Claudio Atzori
2655eea5bc
[orcid enrichment] drop paths before copying the non-modifyed contents
2024-01-19 16:28:05 +01:00
Claudio Atzori
cb9e739484
Merge branch 'beta' into resource_types
2024-01-11 16:29:41 +01:00
Claudio Atzori
2753044d13
refined mapping for the extraction of the original resource type
2024-01-11 16:28:26 +01:00
Miriam Baglioni
e711a05229
fixed conflicts
2024-01-10 11:03:42 +01:00
Claudio Atzori
62104790ae
added metaresourcetype to the result hive DB view
2023-12-21 12:27:10 +01:00
Miriam Baglioni
4740c808f7
-
2023-12-20 14:26:54 +01:00
Claudio Atzori
cb71a7936b
[graph cleaning] avoid stack overflow error when navigating Oaf objects declaring an Enum
2023-12-07 23:09:54 +01:00
Claudio Atzori
259c69e446
[orcid enrichment] fixed workflow definition
2023-12-06 19:41:53 +01:00
Claudio Atzori
2a233a89aa
[graph grouping] added isLookupUrl to the workflow definition, passed to the grouping spark aciton
2023-12-03 13:32:52 +01:00
Claudio Atzori
622fafbd2e
Merge branch 'beta' into orcid_import
2023-12-01 12:28:14 +01:00
Sandro La Bruzzo
bf0fd27c36
Removed unused function
...
Applied PR Comment of Giambattista in the PR
2023-12-01 12:16:42 +01:00
Sandro La Bruzzo
cdfb7588dd
code formatting
2023-11-30 15:31:42 +01:00
Sandro La Bruzzo
5e22b67b8a
Merge remote-tracking branch 'origin/beta' into orcid_import
2023-11-30 15:27:46 +01:00
Sandro La Bruzzo
f718caaac9
Added copy of the untouched entities of the graph
2023-11-30 14:51:00 +01:00
Sandro La Bruzzo
7b5e04f37e
removed Orcid intersection on DOIBoost
2023-11-30 14:36:50 +01:00
Claudio Atzori
4e1aac2e2f
resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350
2023-11-29 14:37:52 +01:00
Sandro La Bruzzo
279100fa52
added test
2023-11-29 11:17:58 +01:00
Sandro La Bruzzo
59111713fa
added comment
2023-11-28 09:00:48 +01:00
Sandro La Bruzzo
6f4d0c05ea
Implemented Author MErger for ORCID that takes in account the case when name and surname are swapped
2023-11-28 08:43:56 +01:00
Sandro La Bruzzo
34a4b3cbdf
Implemented ORCID Enrichment
2023-11-24 12:39:58 +01:00
Claudio Atzori
2c77638bf5
Merge branch 'beta' into cleaning_8898
2023-11-22 14:00:10 +01:00
Claudio Atzori
11a1207f9c
[graph cleaning] applying coar based vocabularies in bulk
2023-11-22 12:22:14 +01:00
Claudio Atzori
262d7c581b
[graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898
2023-10-31 14:34:10 +01:00
Claudio Atzori
b3a61ea955
Merge branch 'beta' into url_validation
2023-10-25 14:22:56 +02:00
Claudio Atzori
7fc621cdec
added defaults to the graph resolution workflow config-default.xml
2023-10-20 22:28:12 +02:00
Claudio Atzori
2b9d0416ec
[graph raw] URL Validator to accept double slashes
2023-10-19 16:26:37 +02:00
Claudio Atzori
6dfcd0c9a2
[raw graph] mapping original resource types
2023-10-16 12:57:18 +02:00
Claudio Atzori
54fbf09ac6
[raw graph] WIP: mapping original resource types
2023-10-16 08:57:47 +02:00