Collection of OOZIE workflows for the OpenAIRE Graph construction, processing, provisioning.
Go to file
Giambattista Bloisi e64c2854a3 Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
dhp-build [maven-release-plugin] prepare for next development iteration 2022-04-07 13:32:22 +02:00
dhp-common Use scala.binary.version property to resolve scala maven dependencies 2023-07-24 11:13:48 +02:00
dhp-doc-resources/img updated image 2020-03-05 15:11:42 +01:00
dhp-pace-core Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface 2023-07-24 15:36:24 +02:00
dhp-workflows Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface 2023-07-24 15:36:24 +02:00
src/site added mvn site for dnet-hadoop project 2021-11-16 15:16:28 +01:00
.gitignore added more ignores 2022-02-24 12:16:17 +01:00
.scalafmt.conf [stats-wf]fixed the result_result table related to PR#191 2022-02-04 14:51:25 +01:00
LICENSE added LICENSE file - AGPL-3.0 2020-04-29 16:11:17 +02:00
README.md Update 'README.md' 2021-07-30 11:54:38 +02:00
pom.xml Use scala.binary.version property to resolve scala maven dependencies 2023-07-24 11:13:48 +02:00

README.md

dnet-hadoop

Dnet-hadoop is the project that defined all the OOZIE workflows for the OpenAIRE Graph construction, processing, provisioning.