[dedup] use common saveParquet
and save
methods to ensure outputs are compressed #349
No reviewers
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#349
Loading…
Reference in New Issue
No description provided.
Delete Branch "fix_dedup_not_compressed"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR uses the common dataset writing methods available in the
eu.dnetlib.dhp.oa.dedup.AbstractSparkAction
class to ensure that all the actions defined in the deduplication workflow produce a compressed output.Looks good to me