Claudio Atzori
|
aff3ddc8d2
|
added cleaning for the format field, removing carrige return and tab characters
|
2021-12-14 11:41:46 +01:00 |
Claudio Atzori
|
41c70c607d
|
cleaning workflow assigns the proper default instance type when a value could not be cleaned using the vocabularies
|
2021-12-09 16:44:28 +01:00 |
Claudio Atzori
|
863a2f9db3
|
avoid to filter OAF records defined as invisible = true
|
2021-12-03 09:08:12 +01:00 |
Claudio Atzori
|
82a4e4efae
|
[cleaning wf] fixed methodology to rule out invalid result titles, based on https://support.openaire.eu/issues/7206
|
2021-11-17 14:17:22 +01:00 |
Claudio Atzori
|
49f897ef29
|
[cleaning wf] fixed regex used to spot garbage in result titles; adjusted threshold for filtering titles
|
2021-11-16 15:24:23 +01:00 |
Claudio Atzori
|
2ee21da43b
|
suggestions from SonarLint
|
2021-08-11 12:13:22 +02:00 |
Claudio Atzori
|
6dddad86ee
|
[cleaning] title cleaning based on the me.xuender:unidecode library
|
2021-07-28 16:21:29 +02:00 |
Claudio Atzori
|
bc835d2024
|
[cleaning] fixed filtering function for missing titles
|
2021-07-23 11:56:13 +02:00 |
Claudio Atzori
|
67afd06cd1
|
[cleaning] cleaning instance.pid and instance.alternateidentifier using the same procedure used to clean result.pid
|
2021-06-24 12:10:17 +02:00 |
Claudio Atzori
|
2039bb9f5f
|
orcid / orcid_pending cleaning backported from master branch
|
2021-06-14 09:40:50 +02:00 |
Claudio Atzori
|
a900bfb874
|
delegating the date parsing to https://github.com/sisyphsu/dateparser
|
2021-06-11 16:53:01 +02:00 |
Claudio Atzori
|
eb6acfbabc
|
[cleaning] removing non parsable relation.validationDate(s)
|
2021-05-28 10:50:44 +02:00 |
Claudio Atzori
|
23b8883ab1
|
applied intellij code cleanup
|
2021-05-14 10:58:12 +02:00 |
Claudio Atzori
|
d4c3476152
|
mapping datasource.journal only when an issn is available, null otherwhise
|
2021-05-11 11:08:54 +02:00 |
Claudio Atzori
|
d1cbee8413
|
imported methods from CleaningFunctions, defined in GraphCleaningFunctions
|
2021-05-10 16:43:39 +02:00 |
Claudio Atzori
|
5afa7d3e0c
|
core utilities in dhp-common moved in external module dhp-schemas
|
2021-04-27 15:44:01 +02:00 |
Claudio Atzori
|
f783e60ff7
|
cleanup
|
2021-04-27 14:04:50 +02:00 |
Claudio Atzori
|
8704d32780
|
code formatting
|
2021-04-15 16:52:58 +02:00 |
Claudio Atzori
|
ba4b4c74d8
|
do not make the identifier prefix depend on the Handle
|
2021-04-15 16:48:26 +02:00 |
miconis
|
2355cc4e9b
|
minor changes and bug fix
|
2021-03-29 10:07:12 +02:00 |
Claudio Atzori
|
3256b9c836
|
code formatting
|
2021-03-19 09:36:12 +01:00 |
Claudio Atzori
|
75144dacb3
|
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
|
2021-03-19 09:07:40 +01:00 |
Claudio Atzori
|
9588bfba81
|
[cleaning] entries avaialbe as PIDs must not appear as alternateIdentifier
|
2021-03-19 09:07:30 +01:00 |
Sandro La Bruzzo
|
25d5663d97
|
added filter
|
2021-03-18 10:24:42 +01:00 |
Sandro La Bruzzo
|
5f98ea74a9
|
Added fix for pid generation in stableIds
|
2021-03-17 15:53:24 +01:00 |
Claudio Atzori
|
734232d3b9
|
identifier factory doesn't depend on pre-existing entity.id
|
2021-03-17 15:14:53 +01:00 |
Claudio Atzori
|
a3dac32f16
|
pidFilter a bit more permissive
|
2021-03-17 15:06:05 +01:00 |
Claudio Atzori
|
8257f9a2bc
|
result.pid: adjusted the mapping applied to the contents from the aggregator
|
2021-03-17 12:45:38 +01:00 |
Claudio Atzori
|
3b2da86f0a
|
added precondition on IdentifierFactory to check the presence of entity.id
|
2021-03-16 17:05:38 +01:00 |
Claudio Atzori
|
640b885706
|
added instance.alternativeIdentifiers to the graph model, adjusted the mapping applied to the contents from the aggregator
|
2021-03-16 14:19:32 +01:00 |
Claudio Atzori
|
c801ab6c1d
|
minor
|
2021-03-09 17:22:31 +01:00 |
Claudio Atzori
|
9917d7e01c
|
PID authorities include ArXiv
|
2021-03-09 17:12:52 +01:00 |
Claudio Atzori
|
01630f638d
|
IdentifierFactory implementation based on the list of datasources authoritative for a given pid type
|
2021-03-09 17:11:50 +01:00 |
Claudio Atzori
|
765f9bdee7
|
merged from dhp_oaf_model
|
2021-03-09 11:37:41 +01:00 |
Claudio Atzori
|
3c5ce1dada
|
code formatting
|
2020-12-09 17:07:20 +01:00 |
Claudio Atzori
|
491ad24750
|
introduced filtering for DOIs in graph cleaning workflow
|
2020-12-09 09:10:33 +01:00 |
Claudio Atzori
|
943b961cf6
|
introduced PidBlacklist
|
2020-12-02 09:30:34 +01:00 |
Claudio Atzori
|
893ac4a77b
|
GenerateEntitiesApplication can be configured to hash the id value or not
|
2020-12-02 09:30:06 +01:00 |
Claudio Atzori
|
349e7246aa
|
do not consider NCID, GBIF as PIDs candidate for the ID creation
|
2020-11-30 16:52:40 +01:00 |
Claudio Atzori
|
2c407e775e
|
GenerateEntitiesApplication can be configured to hash the id value or not
|
2020-11-30 12:00:38 +01:00 |
Claudio Atzori
|
e1a1bb3ee4
|
moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects
|
2020-11-24 18:34:03 +01:00 |
Claudio Atzori
|
e43ab07af6
|
code formatting
|
2020-11-24 14:41:39 +01:00 |
Claudio Atzori
|
c016cc050a
|
IdentifierFactory: in case a record provides more than one pid of the same type, the the lexicographically lower value is chosen as best pick
|
2020-11-23 19:16:40 +01:00 |
Claudio Atzori
|
3f34757c63
|
merged from master
|
2020-11-19 14:34:54 +01:00 |
Claudio Atzori
|
e5da4ee9b1
|
dedup workflow using the common PidComparator
|
2020-11-04 15:02:02 +01:00 |
Claudio Atzori
|
ea2a0ea949
|
IdentifierFactory considers only DOIs matching a given regex
|
2020-11-03 18:43:37 +01:00 |
Claudio Atzori
|
86d6fbe95b
|
refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong
|
2020-11-03 12:19:46 +01:00 |
Claudio Atzori
|
78c3c1b62b
|
exclude pid values set to 'none'
|
2020-11-02 14:25:26 +01:00 |
Claudio Atzori
|
58f28296ea
|
ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values
|
2020-10-30 10:56:42 +01:00 |
Claudio Atzori
|
8958f20813
|
code formatting
|
2020-10-07 13:14:31 +02:00 |