Claudio Atzori
|
cd379eb5e3
|
[Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname
|
2021-01-25 18:11:49 +01:00 |
Claudio Atzori
|
3465c8ccee
|
[Cleaning] trying to avoid NPEs
|
2021-01-25 16:54:53 +01:00 |
Claudio Atzori
|
07a0ccfc96
|
[Cleaning] trying to avoid NPEs
|
2021-01-25 13:36:01 +01:00 |
Claudio Atzori
|
34d653de41
|
[Cleaning] updated cleaning rule for DOIs
|
2021-01-22 14:16:33 +01:00 |
Claudio Atzori
|
28460c2cd1
|
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
|
2020-12-23 16:59:52 +01:00 |
Claudio Atzori
|
723b01f9e9
|
trivial: the less magic numbers and values around, the better
|
2020-12-23 12:22:48 +01:00 |
Claudio Atzori
|
6cb0dc3f43
|
extended OCRID cleaning procedure
|
2020-12-21 11:40:17 +01:00 |
Claudio Atzori
|
a104a632df
|
cleanup
|
2020-12-04 16:32:47 +01:00 |
Claudio Atzori
|
cfb55effd9
|
code formatting
|
2020-12-02 11:23:49 +01:00 |
Claudio Atzori
|
57f448b7a4
|
graph cleaning workflow separate orcid_pending from orcid, depending on the author pid provenance
|
2020-12-02 10:44:05 +01:00 |
Claudio Atzori
|
e731a7658d
|
cleaning texts to remove tab characters too
|
2020-11-27 09:00:04 +01:00 |
Claudio Atzori
|
eeebd5a920
|
Cleanig workflow: remove newlines from titles, descriptions, subjects
|
2020-11-24 18:40:25 +01:00 |
Claudio Atzori
|
cfc01f136e
|
PID filtering based on a blacklist
|
2020-11-17 12:27:06 +01:00 |
Claudio Atzori
|
6ab1ce53c9
|
fixed condition in result pid cleaning; cleanup
|
2020-11-16 10:09:17 +01:00 |
Claudio Atzori
|
2bed29eb09
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:12 +01:00 |
Sandro La Bruzzo
|
cd27df91a1
|
fixed bug on missing relation in ANDS
|
2020-11-06 17:12:31 +01:00 |
Claudio Atzori
|
5310e56dba
|
remove empy PIDs
|
2020-11-03 11:52:10 +01:00 |
Claudio Atzori
|
49ae3450a9
|
code formatting
|
2020-10-02 09:43:24 +02:00 |
Claudio Atzori
|
2e9e13444d
|
author pids made unique by value
|
2020-10-01 12:50:40 +02:00 |
Claudio Atzori
|
e265c3e125
|
cleaning functions factored out in a dedicated class
|
2020-10-01 10:50:15 +02:00 |