Claudio Atzori
|
ebf53a1616
|
added cleaning for relation fields: subRelType & relClass according to dedicated vocabs
|
2021-09-15 16:10:37 +02:00 |
Claudio Atzori
|
2ee21da43b
|
suggestions from SonarLint
|
2021-08-11 12:13:22 +02:00 |
Claudio Atzori
|
d1cbee8413
|
imported methods from CleaningFunctions, defined in GraphCleaningFunctions
|
2021-05-10 16:43:39 +02:00 |
Claudio Atzori
|
5afa7d3e0c
|
core utilities in dhp-common moved in external module dhp-schemas
|
2021-04-27 15:44:01 +02:00 |
Claudio Atzori
|
d1ca025b0b
|
[cleaning] remiving authors without fullname or providing 'deactivated' keyword. Removing test test titles
|
2021-04-13 14:32:41 +02:00 |
Sandro La Bruzzo
|
c73072079d
|
fix conflicts
|
2021-03-22 16:36:31 +01:00 |
Claudio Atzori
|
d525785497
|
[#6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color.
|
2021-03-09 11:12:55 +01:00 |
Sandro La Bruzzo
|
150a617bd1
|
Merge pull request 'aggregation_on_hadoop' (#90) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
Wonderfull code... You're the Best Sandro
|
2021-01-26 16:00:47 +01:00 |
Claudio Atzori
|
885e0dd926
|
[Cleaning] filter authors not providing word characters in the fullname
|
2021-01-26 09:48:53 +01:00 |
Claudio Atzori
|
2890511613
|
[Cleaning] normalise missing Result.country
|
2021-01-26 09:41:44 +01:00 |
Claudio Atzori
|
cd379eb5e3
|
[Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname
|
2021-01-25 18:11:49 +01:00 |
Claudio Atzori
|
3465c8ccee
|
[Cleaning] trying to avoid NPEs
|
2021-01-25 16:54:53 +01:00 |
Sandro La Bruzzo
|
a54848a59c
|
Moved Vocabulary stuff to common module
|
2021-01-25 15:43:04 +01:00 |
Claudio Atzori
|
07a0ccfc96
|
[Cleaning] trying to avoid NPEs
|
2021-01-25 13:36:01 +01:00 |
Claudio Atzori
|
34d653de41
|
[Cleaning] updated cleaning rule for DOIs
|
2021-01-22 14:16:33 +01:00 |
Claudio Atzori
|
26e9d55c13
|
code formatting
|
2021-01-05 09:59:26 +01:00 |
Claudio Atzori
|
7185158942
|
ignore missing properties
|
2020-12-29 11:06:28 +01:00 |
Claudio Atzori
|
28460c2cd1
|
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
|
2020-12-23 16:59:52 +01:00 |
Claudio Atzori
|
723b01f9e9
|
trivial: the less magic numbers and values around, the better
|
2020-12-23 12:22:48 +01:00 |
Claudio Atzori
|
6cb0dc3f43
|
extended OCRID cleaning procedure
|
2020-12-21 11:40:17 +01:00 |
Claudio Atzori
|
a104a632df
|
cleanup
|
2020-12-04 16:32:47 +01:00 |
Claudio Atzori
|
cfb55effd9
|
code formatting
|
2020-12-02 11:23:49 +01:00 |
Claudio Atzori
|
57f448b7a4
|
graph cleaning workflow separate orcid_pending from orcid, depending on the author pid provenance
|
2020-12-02 10:44:05 +01:00 |
Claudio Atzori
|
e731a7658d
|
cleaning texts to remove tab characters too
|
2020-11-27 09:00:04 +01:00 |
Claudio Atzori
|
36173c13a5
|
reverted filters in the clening process
|
2020-11-25 10:24:42 +01:00 |
Claudio Atzori
|
eeebd5a920
|
Cleanig workflow: remove newlines from titles, descriptions, subjects
|
2020-11-24 18:40:25 +01:00 |
Claudio Atzori
|
e1a1bb3ee4
|
moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects
|
2020-11-24 18:34:03 +01:00 |
Claudio Atzori
|
33bae02451
|
reverted behaviour of the cleaning workflow: grouping entities by ID will be managed differently
|
2020-11-24 14:42:33 +01:00 |
Claudio Atzori
|
fcbb05eb21
|
cleanup
|
2020-11-19 15:14:33 +01:00 |
Claudio Atzori
|
3f34757c63
|
merged from master
|
2020-11-19 14:34:54 +01:00 |
Claudio Atzori
|
cfc01f136e
|
PID filtering based on a blacklist
|
2020-11-17 12:27:06 +01:00 |
Claudio Atzori
|
6ab1ce53c9
|
fixed condition in result pid cleaning; cleanup
|
2020-11-16 10:09:17 +01:00 |
Claudio Atzori
|
528231a287
|
grouping graph entities by id turned out to be an easy extension for the already existing cleaning workflow
|
2020-11-13 15:37:48 +01:00 |
Claudio Atzori
|
2bed29eb09
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:12 +01:00 |
Sandro La Bruzzo
|
cd27df91a1
|
fixed bug on missing relation in ANDS
|
2020-11-06 17:12:31 +01:00 |
Claudio Atzori
|
86d6fbe95b
|
refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong
|
2020-11-03 12:19:46 +01:00 |
Claudio Atzori
|
5310e56dba
|
remove empy PIDs
|
2020-11-03 11:52:10 +01:00 |
Claudio Atzori
|
49ae3450a9
|
code formatting
|
2020-10-02 09:43:24 +02:00 |
Claudio Atzori
|
2e9e13444d
|
author pids made unique by value
|
2020-10-01 12:50:40 +02:00 |
Claudio Atzori
|
e265c3e125
|
cleaning functions factored out in a dedicated class
|
2020-10-01 10:50:15 +02:00 |
Claudio Atzori
|
cd631bb5bc
|
defaults fixed in the cleaning workflow forces result.publisher to NULL when result.publisher.value in empty
|
2020-07-30 17:03:53 +02:00 |
Claudio Atzori
|
4ff8007518
|
added function to set the missing vocabulary names, used in the cleaning workflow as a pre-cleaning step
|
2020-07-30 16:24:39 +02:00 |
Michele Artini
|
e1ae964bc4
|
stats
|
2020-07-10 16:12:08 +02:00 |
Claudio Atzori
|
67e1d222b6
|
bulk cleaning when found null or empty, sets bestaccessrights evaluating the result instances
|
2020-07-08 17:53:35 +02:00 |
Claudio Atzori
|
6f5771c1c9
|
sets author.rank when null
|
2020-06-25 14:06:21 +02:00 |
Claudio Atzori
|
0e723d378b
|
added default from vocab for missing instance.refereed; remove spurious prefixes from orcid values; WIP: prepare relation job
|
2020-06-24 18:34:42 +02:00 |
Claudio Atzori
|
7d416f08d8
|
graph cleaning workflow: set hostedby to unknown repository when defined as NULL
|
2020-06-22 09:50:43 +02:00 |
Claudio Atzori
|
d0ac7514b2
|
cleaning workflow to include cleaning of default values
|
2020-06-18 19:37:25 +02:00 |
Claudio Atzori
|
0d52816244
|
WIP: graph cleaner implementation
|
2020-06-13 13:06:04 +02:00 |
Claudio Atzori
|
bed65a1be6
|
WIP: graph cleaner implementation
|
2020-06-12 18:25:47 +02:00 |