author name parsing #220

Merged
claudio.atzori merged 3 commits from author_name_particles into beta 2022-06-27 09:37:28 +02:00

When parsing author names a file name_particles.txt is used to identify particles like van, der, Dr., Mr. etc.. and get rid of them when identifying the name and the surname. This file I suspect it was left behind in some refactoring and in fact, was not being used. This PR moves it again in the proper classpath location and introduces few non functional improvements in its implementation.

When parsing author names a file name_particles.txt is used to identify particles like `van, der, Dr., Mr.` etc.. and get rid of them when identifying the name and the surname. This file I suspect it was left behind in some refactoring and in fact, was not being used. This PR moves it again in the proper classpath location and introduces few non functional improvements in its implementation.
claudio.atzori added the
enhancement
label 2022-06-16 14:34:41 +02:00
alessia.bardi was assigned by claudio.atzori 2022-06-16 14:34:41 +02:00
miriam.baglioni was assigned by claudio.atzori 2022-06-16 14:34:41 +02:00
claudio.atzori added 1 commit 2022-06-16 14:34:41 +02:00

Via the helpdesk (ticket #181), a user confirmed that the names with particles (which she calls "insertions") are:

<Name> van <Surname>
<Surname>, <Name> van
Van <Surname>

And that this applies not only to Dutch names but also to names of other countries (e.g. von)

Via the helpdesk (ticket #181), a user confirmed that the names with particles (which she calls "insertions") are: ``` <Name> van <Surname> <Surname>, <Name> van Van <Surname> ``` And that this applies not only to Dutch names but also to names of other countries (e.g. von)
claudio.atzori added 1 commit 2022-06-27 09:36:59 +02:00
claudio.atzori added 1 commit 2022-06-27 09:37:05 +02:00
claudio.atzori merged commit cba9c2b7cc into beta 2022-06-27 09:37:28 +02:00
claudio.atzori deleted branch author_name_particles 2022-06-27 09:37:28 +02:00

We thought we solve the problem, but the user reported again some problems with her names in some cases.
She did a search by ORCID id and sometimes her name is still wrong:
https://explore.openaire.eu/search/advanced/research-outcomes?f0=authorid&fv0=0000-0003-3272-8007

We thought we solve the problem, but the user reported again some problems with her names in some cases. She did a search by ORCID id and sometimes her name is still wrong: https://explore.openaire.eu/search/advanced/research-outcomes?f0=authorid&fv0=0000-0003-3272-8007
Sign in to join this conversation.
No description provided.