WIP: Graph footprint optimisation #287

Draft
claudio.atzori wants to merge 39 commits from ticket_8369 into beta

This PR introduces several enhancements on the disk/memory footprint of the graph processing pipeline.

  • lighter definition of the model
  • customised serialization that avoids to serialise empty fields
  • using built-in json instead of serde using ObjectMapper
  • using file parquet as encoding file of the graph
This PR introduces several enhancements on the disk/memory footprint of the graph processing pipeline. * lighter definition of the model * customised serialization that avoids to serialise empty fields * using built-in json instead of serde using ObjectMapper * using file parquet as encoding file of the graph
claudio.atzori added 21 commits 2023-04-06 11:54:19 +02:00
claudio.atzori added 3 commits 2023-04-21 08:47:24 +02:00
claudio.atzori added 1 commit 2023-04-21 08:47:53 +02:00
claudio.atzori added 10 commits 2023-04-26 16:02:11 +02:00
claudio.atzori changed title from Graph footprint optimisation to WIP: Graph footprint optimisation 2023-05-02 09:57:59 +02:00
sandro.labruzzo added 3 commits 2023-05-09 12:23:54 +02:00
sandro.labruzzo added 1 commit 2023-05-09 13:55:02 +02:00
sandro.labruzzo added 1 commit 2023-05-10 09:05:29 +02:00
alessia.bardi added 1 commit 2023-06-07 10:29:53 +02:00
claudio.atzori added this to the OpenAIRE - DNet project 2023-10-26 09:59:22 +02:00
This pull request has changes conflicting with the target branch.
  • dhp-common/src/main/java/eu/dnetlib/dhp/oa/merge/AuthorMerger.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/oa/merge/DispatchEntitiesSparkJob.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/oa/merge/GroupEntitiesSparkJob.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/CleaningFunctions.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/GraphCleaningFunctions.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/IdentifierFactory.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/MergeUtils.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/ModelHardLimits.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/OafMapperUtils.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/OrganizationPidComparator.java
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b ticket_8369 beta
git pull origin ticket_8369

Step 2:

Merge the changes and update on Gitea.
git checkout beta
git merge --no-ff ticket_8369
git push origin beta
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#287
No description provided.