author name parsing #220

Merged
claudio.atzori merged 3 commits from author_name_particles into beta 2 years ago
Owner

When parsing author names a file name_particles.txt is used to identify particles like van, der, Dr., Mr. etc.. and get rid of them when identifying the name and the surname. This file I suspect it was left behind in some refactoring and in fact, was not being used. This PR moves it again in the proper classpath location and introduces few non functional improvements in its implementation.

When parsing author names a file name_particles.txt is used to identify particles like `van, der, Dr., Mr.` etc.. and get rid of them when identifying the name and the surname. This file I suspect it was left behind in some refactoring and in fact, was not being used. This PR moves it again in the proper classpath location and introduces few non functional improvements in its implementation.
claudio.atzori added the
enhancement
label 2 years ago
alessia.bardi was assigned by claudio.atzori 2 years ago
miriam.baglioni was assigned by claudio.atzori 2 years ago
claudio.atzori added 1 commit 2 years ago
Owner

Via the helpdesk (ticket #181), a user confirmed that the names with particles (which she calls "insertions") are:

<Name> van <Surname>
<Surname>, <Name> van
Van <Surname>

And that this applies not only to Dutch names but also to names of other countries (e.g. von)

Via the helpdesk (ticket #181), a user confirmed that the names with particles (which she calls "insertions") are: ``` <Name> van <Surname> <Surname>, <Name> van Van <Surname> ``` And that this applies not only to Dutch names but also to names of other countries (e.g. von)
claudio.atzori added 1 commit 2 years ago
claudio.atzori added 1 commit 2 years ago
claudio.atzori merged commit cba9c2b7cc into beta 2 years ago
claudio.atzori deleted branch author_name_particles 2 years ago
Owner

We thought we solve the problem, but the user reported again some problems with her names in some cases.
She did a search by ORCID id and sometimes her name is still wrong:
https://explore.openaire.eu/search/advanced/research-outcomes?f0=authorid&fv0=0000-0003-3272-8007

We thought we solve the problem, but the user reported again some problems with her names in some cases. She did a search by ORCID id and sometimes her name is still wrong: https://explore.openaire.eu/search/advanced/research-outcomes?f0=authorid&fv0=0000-0003-3272-8007
The pull request has been merged as cba9c2b7cc.
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b author_name_particles beta
git pull origin author_name_particles

Step 2:

Merge the changes and update on Gitea.
git checkout beta
git merge --no-ff author_name_particles
git push origin beta
Sign in to join this conversation.
No reviewers
No Milestone
No project
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#220
Loading…
There is no content yet.