WIP: Graph footprint optimisation #287

Draft
claudio.atzori wants to merge 39 commits from ticket_8369 into beta
Owner

This PR introduces several enhancements on the disk/memory footprint of the graph processing pipeline.

  • lighter definition of the model
  • customised serialization that avoids to serialise empty fields
  • using built-in json instead of serde using ObjectMapper
  • using file parquet as encoding file of the graph
This PR introduces several enhancements on the disk/memory footprint of the graph processing pipeline. * lighter definition of the model * customised serialization that avoids to serialise empty fields * using built-in json instead of serde using ObjectMapper * using file parquet as encoding file of the graph
claudio.atzori added 21 commits 1 year ago
claudio.atzori added 3 commits 1 year ago
claudio.atzori added 1 commit 1 year ago
claudio.atzori added 10 commits 1 year ago
claudio.atzori changed title from Graph footprint optimisation to WIP: Graph footprint optimisation 1 year ago
sandro.labruzzo added 3 commits 12 months ago
sandro.labruzzo added 1 commit 12 months ago
sandro.labruzzo added 1 commit 12 months ago
alessia.bardi added 1 commit 11 months ago
claudio.atzori added this to the OpenAIRE - DNet project 6 months ago
This pull request has changes conflicting with the target branch.
  • dhp-common/src/main/java/eu/dnetlib/dhp/oa/merge/AuthorMerger.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/oa/merge/DispatchEntitiesSparkJob.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/oa/merge/GroupEntitiesSparkJob.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/CleaningFunctions.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/GraphCleaningFunctions.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/IdentifierFactory.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/MergeUtils.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/ModelHardLimits.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/OafMapperUtils.java
  • dhp-common/src/main/java/eu/dnetlib/dhp/schema/oaf/utils/OrganizationPidComparator.java
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b ticket_8369 beta
git pull origin ticket_8369

Step 2:

Merge the changes and update on Gitea.
git checkout beta
git merge --no-ff ticket_8369
git push origin beta
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#287
Loading…
There is no content yet.