Claudio Atzori
42f5506306
[orcid enrichment] fixed directory cleanup before distcp
2024-02-05 09:45:36 +02:00
Alessia Bardi
f2a08d8cc2
test for Italian records from IRS repositories
2024-01-30 19:20:14 +01:00
Claudio Atzori
2655eea5bc
[orcid enrichment] drop paths before copying the non-modifyed contents
2024-01-19 16:28:05 +01:00
Claudio Atzori
cb9e739484
Merge branch 'beta' into resource_types
2024-01-11 16:29:41 +01:00
Claudio Atzori
2753044d13
refined mapping for the extraction of the original resource type
2024-01-11 16:28:26 +01:00
Miriam Baglioni
e711a05229
fixed conflicts
2024-01-10 11:03:42 +01:00
Claudio Atzori
62104790ae
added metaresourcetype to the result hive DB view
2023-12-21 12:27:10 +01:00
Miriam Baglioni
4740c808f7
-
2023-12-20 14:26:54 +01:00
Claudio Atzori
cb71a7936b
[graph cleaning] avoid stack overflow error when navigating Oaf objects declaring an Enum
2023-12-07 23:09:54 +01:00
Claudio Atzori
259c69e446
[orcid enrichment] fixed workflow definition
2023-12-06 19:41:53 +01:00
Claudio Atzori
2a233a89aa
[graph grouping] added isLookupUrl to the workflow definition, passed to the grouping spark aciton
2023-12-03 13:32:52 +01:00
Claudio Atzori
622fafbd2e
Merge branch 'beta' into orcid_import
2023-12-01 12:28:14 +01:00
Sandro La Bruzzo
bf0fd27c36
Removed unused function
...
Applied PR Comment of Giambattista in the PR
2023-12-01 12:16:42 +01:00
Sandro La Bruzzo
cdfb7588dd
code formatting
2023-11-30 15:31:42 +01:00
Sandro La Bruzzo
5e22b67b8a
Merge remote-tracking branch 'origin/beta' into orcid_import
2023-11-30 15:27:46 +01:00
Sandro La Bruzzo
f718caaac9
Added copy of the untouched entities of the graph
2023-11-30 14:51:00 +01:00
Sandro La Bruzzo
7b5e04f37e
removed Orcid intersection on DOIBoost
2023-11-30 14:36:50 +01:00
Claudio Atzori
4e1aac2e2f
resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350
2023-11-29 14:37:52 +01:00
Sandro La Bruzzo
279100fa52
added test
2023-11-29 11:17:58 +01:00
Sandro La Bruzzo
59111713fa
added comment
2023-11-28 09:00:48 +01:00
Sandro La Bruzzo
6f4d0c05ea
Implemented Author MErger for ORCID that takes in account the case when name and surname are swapped
2023-11-28 08:43:56 +01:00
Sandro La Bruzzo
34a4b3cbdf
Implemented ORCID Enrichment
2023-11-24 12:39:58 +01:00
Claudio Atzori
2c77638bf5
Merge branch 'beta' into cleaning_8898
2023-11-22 14:00:10 +01:00
Claudio Atzori
11a1207f9c
[graph cleaning] applying coar based vocabularies in bulk
2023-11-22 12:22:14 +01:00
Claudio Atzori
262d7c581b
[graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898
2023-10-31 14:34:10 +01:00
Claudio Atzori
b3a61ea955
Merge branch 'beta' into url_validation
2023-10-25 14:22:56 +02:00
Claudio Atzori
7fc621cdec
added defaults to the graph resolution workflow config-default.xml
2023-10-20 22:28:12 +02:00
Claudio Atzori
2b9d0416ec
[graph raw] URL Validator to accept double slashes
2023-10-19 16:26:37 +02:00
Claudio Atzori
6dfcd0c9a2
[raw graph] mapping original resource types
2023-10-16 12:57:18 +02:00
Claudio Atzori
54fbf09ac6
[raw graph] WIP: mapping original resource types
2023-10-16 08:57:47 +02:00
Claudio Atzori
554551682d
[raw graph] adopting the new COAR based vocabularies for the resource typing
2023-10-11 16:09:19 +02:00
Claudio Atzori
eed9fe0902
code formatting
2023-10-06 12:31:17 +02:00
Claudio Atzori
dc86018a5f
Merge branch 'merge_entities_job' into beta
2023-10-02 11:24:48 +02:00
Alessia Bardi
0935d7757c
Use v5 of the UNIBI Gold ISSN list in test
2023-09-20 15:41:35 +02:00
Alessia Bardi
cc7204a089
tests for d4science catalog
2023-09-20 15:38:32 +02:00
Claudio Atzori
5b06c9d06f
[graph raw] datainfo.invisible set as true only for entities
2023-09-04 15:15:24 +02:00
Giambattista Bloisi
6cc7d8ca7b
GroupEntities and DispatchEntites are now merged in GroupEntitiesSparkJob
2023-08-30 10:43:31 +02:00
Claudio Atzori
bf35280ea6
code formatting
2023-08-29 11:11:00 +02:00
Giambattista Bloisi
95cd2b9b1e
Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
...
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
2023-08-10 11:53:48 +02:00
Giambattista Bloisi
fab9920271
DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
2023-08-09 15:41:43 +02:00
Miriam Baglioni
c25ac21e5e
Merge pull request 'graph cleaning, suggestions from ticket 8898' ( #325 ) from cleaning_8898 into beta
...
Reviewed-on: D-Net/dnet-hadoop#325
2023-08-08 11:14:19 +02:00
Claudio Atzori
11ffb9bd68
rule out records with NULL dataInfo
2023-07-31 12:35:33 +02:00
Claudio Atzori
270df939c4
partial implementation of the suggestions from https://support.openaire.eu/issues/8898
2023-07-25 17:29:50 +02:00
Giambattista Bloisi
e64c2854a3
Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
...
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Giambattista Bloisi
bb5b845e3c
Use scala.binary.version property to resolve scala maven dependencies
...
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
Claudio Atzori
b76a47b103
[aggregator graph] added column alias when mapping organization PIDs from the OpenOrgs database
2023-06-13 11:38:10 +02:00
Claudio Atzori
ad04f14b81
Merge branch 'beta' into distinct_pids_from_openorgs_beta
2023-06-12 09:58:21 +02:00
Claudio Atzori
e1409ffe80
update sql query to return distinct pids
2023-06-12 09:47:45 +02:00
Claudio Atzori
e45777e7e1
[aggregator graph] added validation for URLs mapped from oaf:fulltext
2023-05-26 11:33:42 +02:00
Claudio Atzori
8acad52a0c
Merge branch 'beta' into apc_affiliation
2023-05-15 15:47:33 +02:00