author name parsing #220
No reviewers
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#220
Loading…
Reference in New Issue
No description provided.
Delete Branch "author_name_particles"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When parsing author names a file name_particles.txt is used to identify particles like
van, der, Dr., Mr.
etc.. and get rid of them when identifying the name and the surname. This file I suspect it was left behind in some refactoring and in fact, was not being used. This PR moves it again in the proper classpath location and introduces few non functional improvements in its implementation.Via the helpdesk (ticket #181), a user confirmed that the names with particles (which she calls "insertions") are:
And that this applies not only to Dutch names but also to names of other countries (e.g. von)
We thought we solve the problem, but the user reported again some problems with her names in some cases.
She did a search by ORCID id and sometimes her name is still wrong:
https://explore.openaire.eu/search/advanced/research-outcomes?f0=authorid&fv0=0000-0003-3272-8007