Commit Graph

1366 Commits (master)

Author SHA1 Message Date
Michele Artini 4374d7449e mapping of project PIDs 2 months ago
Claudio Atzori b3ddbaed58 fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite) 2 months ago
Claudio Atzori 1416f16b35 [graph raw] fixed mapping of the original resource type from the Datacite format 3 months ago
Claudio Atzori f28c63d5ef [orcid enrichment] fixed directory cleanup before distcp 3 months ago
Claudio Atzori 2655eea5bc [orcid enrichment] drop paths before copying the non-modifyed contents 3 months ago
Claudio Atzori cb9e739484 Merge branch 'beta' into resource_types 4 months ago
Claudio Atzori 2753044d13 refined mapping for the extraction of the original resource type 4 months ago
Miriam Baglioni e711a05229 fixed conflicts 4 months ago
Claudio Atzori 62104790ae added metaresourcetype to the result hive DB view 4 months ago
Miriam Baglioni 4740c808f7 - 4 months ago
Claudio Atzori cb71a7936b [graph cleaning] avoid stack overflow error when navigating Oaf objects declaring an Enum 5 months ago
Claudio Atzori 259c69e446 [orcid enrichment] fixed workflow definition 5 months ago
Claudio Atzori 2a233a89aa [graph grouping] added isLookupUrl to the workflow definition, passed to the grouping spark aciton 5 months ago
Claudio Atzori 622fafbd2e Merge branch 'beta' into orcid_import 5 months ago
Sandro La Bruzzo bf0fd27c36 Removed unused function
Applied PR Comment of Giambattista in the PR
5 months ago
Sandro La Bruzzo cdfb7588dd code formatting 5 months ago
Sandro La Bruzzo 5e22b67b8a Merge remote-tracking branch 'origin/beta' into orcid_import 5 months ago
Sandro La Bruzzo f718caaac9 Added copy of the untouched entities of the graph 5 months ago
Sandro La Bruzzo 7b5e04f37e removed Orcid intersection on DOIBoost 5 months ago
Claudio Atzori 4e1aac2e2f resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350 5 months ago
Sandro La Bruzzo 279100fa52 added test 5 months ago
Sandro La Bruzzo 59111713fa added comment 5 months ago
Sandro La Bruzzo 6f4d0c05ea Implemented Author MErger for ORCID that takes in account the case when name and surname are swapped 5 months ago
Sandro La Bruzzo 34a4b3cbdf Implemented ORCID Enrichment 5 months ago
Claudio Atzori 2c77638bf5 Merge branch 'beta' into cleaning_8898 5 months ago
Claudio Atzori 11a1207f9c [graph cleaning] applying coar based vocabularies in bulk 5 months ago
Claudio Atzori 262d7c581b [graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898 6 months ago
Claudio Atzori b3a61ea955 Merge branch 'beta' into url_validation 6 months ago
Claudio Atzori 7fc621cdec added defaults to the graph resolution workflow config-default.xml 6 months ago
Claudio Atzori 2b9d0416ec [graph raw] URL Validator to accept double slashes 6 months ago
Claudio Atzori 6dfcd0c9a2 [raw graph] mapping original resource types 6 months ago
Claudio Atzori 54fbf09ac6 [raw graph] WIP: mapping original resource types 6 months ago
Claudio Atzori 554551682d [raw graph] adopting the new COAR based vocabularies for the resource typing 7 months ago
Claudio Atzori eed9fe0902 code formatting 7 months ago
Claudio Atzori dc86018a5f Merge branch 'merge_entities_job' into beta 7 months ago
Alessia Bardi 0935d7757c Use v5 of the UNIBI Gold ISSN list in test 7 months ago
Alessia Bardi cc7204a089 tests for d4science catalog 7 months ago
Claudio Atzori 5b06c9d06f [graph raw] datainfo.invisible set as true only for entities 8 months ago
Giambattista Bloisi 6cc7d8ca7b GroupEntities and DispatchEntites are now merged in GroupEntitiesSparkJob 8 months ago
Claudio Atzori bf35280ea6 code formatting 8 months ago
Giambattista Bloisi 95cd2b9b1e Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
9 months ago
Giambattista Bloisi fab9920271 DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag 9 months ago
Miriam Baglioni c25ac21e5e Merge pull request 'graph cleaning, suggestions from ticket 8898' (#325) from cleaning_8898 into beta
Reviewed-on: #325
9 months ago
Claudio Atzori 11ffb9bd68 rule out records with NULL dataInfo 9 months ago
Claudio Atzori 270df939c4 partial implementation of the suggestions from https://support.openaire.eu/issues/8898 9 months ago
Giambattista Bloisi e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
9 months ago
Giambattista Bloisi bb5b845e3c Use scala.binary.version property to resolve scala maven dependencies
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
9 months ago
Claudio Atzori b76a47b103 [aggregator graph] added column alias when mapping organization PIDs from the OpenOrgs database 11 months ago
Claudio Atzori ad04f14b81 Merge branch 'beta' into distinct_pids_from_openorgs_beta 11 months ago
Claudio Atzori e1409ffe80 update sql query to return distinct pids 11 months ago