Updated XMLIterator for splitting on different nodes #436
No reviewers
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#436
Loading…
Reference in New Issue
No description provided.
Delete Branch "dblp_collection_plugin"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
For DBLP, we need to split XML files using the FileGZIPCollector plugin based on different tag names. I've updated the XMLIterator to accept the splitOnElement parameter as a comma-separated list of values. This allows the XMLIterator to check if the tag name is one of the selected tags and split accordingly. If a single tag is provided, it operates in the standard way.
As far as I can test it, it seems to be working. @miriam.baglioni @claudio.atzori @giambattista.bloisi Can you also review the code and the functionality and merge it so that we proceed deploying it for the DBLP aggregation?
Good to go also from my point of view. I am merging the PR in beta.