Claudio Atzori
6be783caec
[graph cleaning] use sparkExecutorMemory to define also the memoryOverhead
2024-05-29 14:36:49 +02:00
Claudio Atzori
8e45c5baa8
graph cleaning to implement ugly hardcoded rules
2024-05-28 15:28:42 +02:00
Claudio Atzori
db5e18c784
hostedby patching to work with the updated Crossref contents
2024-05-28 15:28:13 +02:00
Claudio Atzori
c3fe59bc78
fixed conflicts merging from beta, code formatting
2024-05-21 14:50:40 +02:00
Claudio Atzori
0486227185
[cleaning] deactivating the cleaning of FOS subjects found in the metadata provided by repositories
2024-05-03 14:31:12 +02:00
Claudio Atzori
4355f64810
reverted to version 1.2.5-SNAPSHOT
2024-05-02 11:23:53 +02:00
Claudio Atzori
66680b8b9a
refactoring of common utilities
2024-05-02 11:16:58 +02:00
Claudio Atzori
dcf23b3d06
Merge branch 'beta' into beta-release-1.2.5
2024-05-02 10:01:49 +02:00
Giambattista Bloisi
1878199dae
Miscellaneous fixes:
...
- in Merge By ID pick by preference those records coming from delegated Authorities
- fix various tests
- close spark session in SparkCreateSimRels
2024-04-24 08:12:45 +02:00
Claudio Atzori
c3053ef34d
using version 1.2.5-beta for the release
2024-04-23 14:52:32 +02:00
Claudio Atzori
b5bcab13ec
using version 1.2.5-beta for the release
2024-04-23 14:36:39 +02:00
Claudio Atzori
425c9afc36
using version 1.2.5-beta for the release
2024-04-23 14:30:04 +02:00
Claudio Atzori
0656ab2838
code formatting
2024-04-20 08:10:58 +02:00
Claudio Atzori
ab7f0855af
fixed query reading projects from the aggregator DB
2024-04-20 08:10:32 +02:00
Giambattista Bloisi
8ac167e420
Refinements to PR #404 : refactoring the Oaf records merge utilities into dhp-common
2024-04-16 17:18:28 +02:00
Giambattista Bloisi
43b454399f
- Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
...
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Claudio Atzori
ef52128c55
included new stats* workflows in parent pom list of modules, code formatting
2024-03-26 10:42:10 +01:00
Claudio Atzori
bfba71a95c
further follow up changes from integrating the mergeutils branch
2024-03-26 09:01:18 +01:00
Claudio Atzori
538b180fe0
Merge branch 'beta' into oaf_country_beta
2024-03-25 16:13:20 +01:00
Giambattista Bloisi
3f22c101d9
Merge pull request 'Enrich authors with ORCID info using new matching algorithm' ( #398 ) from new_orcid_enhancement into beta
...
Reviewed-on: D-Net/dnet-hadoop#398
2024-03-22 17:29:20 +01:00
Giambattista Bloisi
0ff7faad72
Fix conditions that prevented ORCID Enrichment
2024-03-22 16:24:49 +01:00
Giambattista Bloisi
664a381d31
Unify merge logic of entities in MergeUtils.class
2024-03-18 16:04:49 +01:00
Michele Artini
30167aa882
mapped oaf:country from results
2024-03-15 11:24:16 +01:00
Giambattista Bloisi
9092075760
Enrich authors with ORCID info using new matching algorithm
2024-03-11 13:23:59 +01:00
Sandro La Bruzzo
7d806a434c
formatted code
2024-02-28 09:31:58 +01:00
Michele Artini
3268570b2c
mapping of project PIDs
2024-02-22 14:47:21 +01:00
Michele Artini
4374d7449e
mapping of project PIDs
2024-02-22 14:44:35 +01:00
Claudio Atzori
b3ddbaed58
fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite)
2024-02-15 15:02:48 +01:00
Claudio Atzori
a63b091bae
Merge branch 'beta' into import_orps_fix
2024-02-15 15:01:56 +01:00
Claudio Atzori
d85d2df6ad
[graph raw] fixed mapping of the original resource type from the Datacite format
2024-02-09 10:20:20 +01:00
Claudio Atzori
1416f16b35
[graph raw] fixed mapping of the original resource type from the Datacite format
2024-02-09 10:19:53 +01:00
Claudio Atzori
38c9001147
fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite)
2024-02-07 17:02:05 +01:00
Claudio Atzori
42f5506306
[orcid enrichment] fixed directory cleanup before distcp
2024-02-05 09:45:36 +02:00
Claudio Atzori
f28c63d5ef
[orcid enrichment] fixed directory cleanup before distcp
2024-02-05 09:44:56 +02:00
Alessia Bardi
f2a08d8cc2
test for Italian records from IRS repositories
2024-01-30 19:20:14 +01:00
Claudio Atzori
2655eea5bc
[orcid enrichment] drop paths before copying the non-modifyed contents
2024-01-19 16:28:05 +01:00
Claudio Atzori
cb9e739484
Merge branch 'beta' into resource_types
2024-01-11 16:29:41 +01:00
Claudio Atzori
2753044d13
refined mapping for the extraction of the original resource type
2024-01-11 16:28:26 +01:00
Miriam Baglioni
e711a05229
fixed conflicts
2024-01-10 11:03:42 +01:00
Claudio Atzori
62104790ae
added metaresourcetype to the result hive DB view
2023-12-21 12:27:10 +01:00
Miriam Baglioni
4740c808f7
-
2023-12-20 14:26:54 +01:00
Claudio Atzori
cb71a7936b
[graph cleaning] avoid stack overflow error when navigating Oaf objects declaring an Enum
2023-12-07 23:09:54 +01:00
Claudio Atzori
259c69e446
[orcid enrichment] fixed workflow definition
2023-12-06 19:41:53 +01:00
Claudio Atzori
2a233a89aa
[graph grouping] added isLookupUrl to the workflow definition, passed to the grouping spark aciton
2023-12-03 13:32:52 +01:00
Claudio Atzori
622fafbd2e
Merge branch 'beta' into orcid_import
2023-12-01 12:28:14 +01:00
Sandro La Bruzzo
bf0fd27c36
Removed unused function
...
Applied PR Comment of Giambattista in the PR
2023-12-01 12:16:42 +01:00
Sandro La Bruzzo
cdfb7588dd
code formatting
2023-11-30 15:31:42 +01:00
Sandro La Bruzzo
5e22b67b8a
Merge remote-tracking branch 'origin/beta' into orcid_import
2023-11-30 15:27:46 +01:00
Sandro La Bruzzo
f718caaac9
Added copy of the untouched entities of the graph
2023-11-30 14:51:00 +01:00
Sandro La Bruzzo
7b5e04f37e
removed Orcid intersection on DOIBoost
2023-11-30 14:36:50 +01:00