Make repeatable simple fields unique by value #23

Closed
opened 2020-07-16 16:32:36 +02:00 by claudio.atzori · 3 comments

Please introduce in the mapping from the MDStores applied to both Oaf and Odf records a mechanism to avoid repeating the same identical value in fields where it doesn't make sense to have the same information repeated several times.

For example, I just noticed the following record is bearing the same dc:source 689 times

<source>Journal of Preventive Medicine and Hygiene</source>

record id od_______267::801c6dcec8f0a675871c2ca8ecabf75e

Clerarly the uniqueness criteria changes from type to type, so let's start with eu.dnetlib.dhp.schema.oaf.Field

Please introduce in the mapping from the MDStores applied to both Oaf and Odf records a mechanism to avoid repeating the same identical value in fields where it doesn't make sense to have the same information repeated several times. For example, I just noticed the following record is bearing the same `dc:source` 689 times ```<source>Journal of Preventive Medicine and Hygiene</source>``` record id `od_______267::801c6dcec8f0a675871c2ca8ecabf75e` Clerarly the uniqueness criteria changes from type to type, so let's start with `eu.dnetlib.dhp.schema.oaf.Field`
michele.artini was assigned by claudio.atzori 2020-07-16 16:32:36 +02:00
Member

The best solution should be to modify the datamodel replacing all the occurences of List with Set.

I suggest to use the LinkedHashSet class because it preserve the order of the elements.

The Field class should also implement the hashCode() method.

The best solution should be to modify the datamodel replacing all the occurences of List<Field> with Set<Field>. I suggest to use the LinkedHashSet class because it preserve the order of the elements. The Field class should also implement the hashCode() method.
Member

I have implemented this fix: #25

You can accept the Pull Request, but I suggest you to update the Oaf classes with LinkedHashSets.

I have implemented this fix: https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/25 You can accept the Pull Request, but I suggest you to update the Oaf classes with LinkedHashSets.
Author
Owner

Thanks Michele, the PR integrated and the task to extend the model is tracked in #9 (comment)

Thanks Michele, the PR integrated and the task to extend the model is tracked in https://code-repo.d4science.org/D-Net/dnet-hadoop/issues/9#issuecomment-1583
claudio.atzori added the
enhancement
label 2020-07-27 18:03:21 +02:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#23
No description provided.