Clean Country #241
No reviewers
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#241
Loading…
Reference in New Issue
No description provided.
Delete Branch "clean_country"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR is the first naive implementation to clean the country element. It is done to remove from the records collected from NARCIS the country
NL
. It is needed because NARCIS has been included in the allowed datasources for the country propagation step. Being NARCIS an aggregator some of the repositories it collect from do not provide only elements from the NL. In that case the association to the country should be removed.The criteria to be matched are:
10.17632
(Mendely data) or10.5061
(Dryad),hostedby.key
any institutional repository with countryNL
,collectedfrom.value = NARCIS
,NL
has been inserted via propagation.If all the constraints above match, the country
NL
is removed from the set of the countries for the result.The introduction of this new cleaning process modifies one parameter in the workflow: before we had the
shouldCleanContext
parameter, now we have theshouldClean
parameter instand. If this param is set to true all the cleaning action for the result will be triggered. So far they are clean of context for sobigdata and clean of country for NARCISTwo comments without having seen the changeset on the PR description:
@ -0,0 +94,4 @@
List<String> hostedBy = spark
.read()
.textFile(datasourcePath)
// .filter((FilterFunction<String>) ds -> !ds.equals(collectedfrom))
if not needed, clean it up please.
@ -0,0 +113,4 @@
if (r
.getPid()
.stream()
.anyMatch(p -> p.getQualifier().getClassid().equals("doi") && pidInParam(p.getValue(), verifyParam))) {
Could you replace the
doi
string with the serialization ofeu.dnetlib.dhp.schema.oaf.utils.PidType#doi
?Minor changes. Please check the inline comments.
I have no changes to suggest. You have my green light to test it on beta.