Claudio Atzori
|
ba4b4c74d8
|
do not make the identifier prefix depend on the Handle
|
2021-04-15 16:48:26 +02:00 |
Claudio Atzori
|
710cd1e8f2
|
Merge pull request 'add xslt, personname cleaner' (#104) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
Reviewed-on: #104
LGTM
|
2021-04-13 14:43:05 +02:00 |
Claudio Atzori
|
d1ca025b0b
|
[cleaning] remiving authors without fullname or providing 'deactivated' keyword. Removing test test titles
|
2021-04-13 14:32:41 +02:00 |
Andreas Czerniak
|
d7614c1f85
|
introduce new const
|
2021-04-13 07:04:27 +02:00 |
Claudio Atzori
|
902d05f548
|
[cleaning] avoiding NPEs handling null author PIDs
|
2021-04-12 17:31:40 +02:00 |
Claudio Atzori
|
72ce741ea6
|
WIP: using common definitions from ModelConstants
|
2021-03-31 17:07:13 +02:00 |
Claudio Atzori
|
27681b876c
|
code formatting
|
2021-03-29 17:47:11 +02:00 |
miconis
|
2709d08fc2
|
Merge branch 'stable_ids' into openorgswf
|
2021-03-29 16:39:07 +02:00 |
Claudio Atzori
|
3becaa5539
|
[Cleaning] drop alternate identifiers with empty values
|
2021-03-29 16:01:35 +02:00 |
Claudio Atzori
|
48f2b6127e
|
[Cleaning] drop alternate identifiers with empty values
|
2021-03-29 14:23:18 +02:00 |
miconis
|
2355cc4e9b
|
minor changes and bug fix
|
2021-03-29 10:07:12 +02:00 |
Claudio Atzori
|
b5b7dc2104
|
[Cleaning] drop alternate identifiers with empty values
|
2021-03-26 12:30:00 +01:00 |
Claudio Atzori
|
827e7e37db
|
[Cleaning] drop instance.alternateIdentifier elements when they are available among instance.pid
|
2021-03-25 11:07:59 +01:00 |
Claudio Atzori
|
431cbe9955
|
handle missing instance.pid during bulk cleaning
|
2021-03-23 09:28:58 +01:00 |
Sandro La Bruzzo
|
c73072079d
|
fix conflicts
|
2021-03-22 16:36:31 +01:00 |
Claudio Atzori
|
3256b9c836
|
code formatting
|
2021-03-19 09:36:12 +01:00 |
Claudio Atzori
|
75144dacb3
|
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
|
2021-03-19 09:07:40 +01:00 |
Claudio Atzori
|
9588bfba81
|
[cleaning] entries avaialbe as PIDs must not appear as alternateIdentifier
|
2021-03-19 09:07:30 +01:00 |
Sandro La Bruzzo
|
25d5663d97
|
added filter
|
2021-03-18 10:24:42 +01:00 |
Sandro La Bruzzo
|
5f98ea74a9
|
Added fix for pid generation in stableIds
|
2021-03-17 15:53:24 +01:00 |
Claudio Atzori
|
734232d3b9
|
identifier factory doesn't depend on pre-existing entity.id
|
2021-03-17 15:14:53 +01:00 |
Claudio Atzori
|
a3dac32f16
|
pidFilter a bit more permissive
|
2021-03-17 15:06:05 +01:00 |
Claudio Atzori
|
8257f9a2bc
|
result.pid: adjusted the mapping applied to the contents from the aggregator
|
2021-03-17 12:45:38 +01:00 |
Claudio Atzori
|
3b2da86f0a
|
added precondition on IdentifierFactory to check the presence of entity.id
|
2021-03-16 17:05:38 +01:00 |
Claudio Atzori
|
640b885706
|
added instance.alternativeIdentifiers to the graph model, adjusted the mapping applied to the contents from the aggregator
|
2021-03-16 14:19:32 +01:00 |
Claudio Atzori
|
f74e464942
|
create bestaccessright as Qualifier
|
2021-03-10 15:40:05 +01:00 |
Claudio Atzori
|
c801ab6c1d
|
minor
|
2021-03-09 17:22:31 +01:00 |
Claudio Atzori
|
9917d7e01c
|
PID authorities include ArXiv
|
2021-03-09 17:12:52 +01:00 |
Claudio Atzori
|
01630f638d
|
IdentifierFactory implementation based on the list of datasources authoritative for a given pid type
|
2021-03-09 17:11:50 +01:00 |
Claudio Atzori
|
b3f3b895e5
|
[#6282 open access status in the Graph] OAStatus renamed as openAccessRoute
|
2021-03-09 11:41:11 +01:00 |
Claudio Atzori
|
765f9bdee7
|
merged from dhp_oaf_model
|
2021-03-09 11:37:41 +01:00 |
Claudio Atzori
|
d525785497
|
[#6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color.
|
2021-03-09 11:12:55 +01:00 |
Claudio Atzori
|
8d2bb24512
|
merged from master
|
2021-03-08 15:44:34 +01:00 |
Claudio Atzori
|
fa7930d2e2
|
merging contributions from PR#97
|
2021-03-05 15:45:28 +01:00 |
Claudio Atzori
|
ec80b7ade3
|
code formatting
|
2021-03-03 10:22:53 +01:00 |
Claudio Atzori
|
b73dce3e3a
|
more logging on the MDStore mongodb client. Forcing UTF_8 encoding on the content
|
2021-03-03 10:17:16 +01:00 |
Claudio Atzori
|
e76c4f62c1
|
MetadataRecord moved in dhp-schemas
|
2021-02-26 10:58:48 +01:00 |
Claudio Atzori
|
b830e33392
|
mdstore collector plugin
|
2021-02-25 12:30:30 +01:00 |
Claudio Atzori
|
dc98c39500
|
more logging
|
2021-02-25 12:29:18 +01:00 |
Claudio Atzori
|
fc3fa5e343
|
implemented mdstore collector plugin
|
2021-02-24 15:07:24 +01:00 |
Claudio Atzori
|
cf27905a71
|
WIP: collectorWorker error reporting, added report messages
|
2021-02-16 16:53:14 +01:00 |
Claudio Atzori
|
58288a95b8
|
WIP: collectorWorker error reporting, added report messages
|
2021-02-15 15:28:53 +01:00 |
Claudio Atzori
|
1abe6d1ad7
|
WIP: collectorWorker error reporting, added report messages
|
2021-02-15 15:08:59 +01:00 |
Claudio Atzori
|
29c6f7e255
|
classes related to the collection workflow moved into common package; implemented MongoDB collection plugins
|
2021-02-12 12:31:02 +01:00 |
Claudio Atzori
|
50add4c61b
|
added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common
|
2021-02-08 12:19:38 +01:00 |
Claudio Atzori
|
40df0f987d
|
better logging, WIP: collectorWorker error reporting; common functions moved in DHPUtils
|
2021-02-06 20:12:00 +01:00 |
Claudio Atzori
|
a8a758925e
|
better logging, WIP: collectorWorker error reporting
|
2021-02-05 19:18:05 +01:00 |
Michele Artini
|
2ee0c3e47e
|
http entity as json string
|
2021-02-05 09:45:39 +01:00 |
Sandro La Bruzzo
|
4dae5e605d
|
implemented messaging btween collection worker and Dnet
|
2021-02-04 15:51:15 +01:00 |
Claudio Atzori
|
40764cf626
|
better logging, WIP: collectorWorker error reporting
|
2021-02-04 14:06:02 +01:00 |
Michele Artini
|
26d2eb946f
|
messages sender
|
2021-02-04 09:45:46 +01:00 |
Michele Artini
|
1b9731632b
|
Message Sender
|
2021-02-03 16:42:36 +01:00 |
Michele Artini
|
820d729e99
|
recover of Message and MessageType class
|
2021-02-03 16:20:34 +01:00 |
Claudio Atzori
|
0e8a4f9f1a
|
better logging, WIP: collectorWorker error reporting
|
2021-02-03 12:33:41 +01:00 |
Claudio Atzori
|
d62ea1490d
|
cleaned up RabbitMQ stuff
|
2021-02-02 10:53:19 +01:00 |
Claudio Atzori
|
73d772a4b4
|
added method to list the known vocabulary names
|
2021-02-02 10:39:47 +01:00 |
Claudio Atzori
|
8eaa1fd4b4
|
WIP: metadata collection in INCREMENTAL mode and relative test
|
2021-02-01 19:29:10 +01:00 |
Sandro La Bruzzo
|
0276180039
|
WIP mdstore
transaction implemented on hadoop side
|
2021-01-29 16:42:41 +01:00 |
Michele Artini
|
d942d0c77d
|
methods toString(), hashCode() and equals()
|
2021-01-29 13:16:48 +01:00 |
Michele Artini
|
38f2508c87
|
new fields in mdstore beans
|
2021-01-28 08:24:45 +01:00 |
Sandro La Bruzzo
|
a54848a59c
|
Moved Vocabulary stuff to common module
|
2021-01-25 15:43:04 +01:00 |
Claudio Atzori
|
6848d0c3d7
|
trivial: avoid duplicated code
|
2020-12-23 12:21:58 +01:00 |
Claudio Atzori
|
d8b5f43a7e
|
code formatting
|
2020-12-22 14:59:03 +01:00 |
miconis
|
794e22b09c
|
bug fix in the authormerge: now authors with higher size have priority, normalization of author name fixed
|
2020-12-21 17:51:42 +01:00 |
Claudio Atzori
|
12e2f930c8
|
resolved conflicts
|
2020-12-10 10:57:39 +01:00 |
Alessia Bardi
|
112da6d76a
|
in theory, just auto-formatting after mvn compile
|
2020-12-09 20:00:27 +01:00 |
Miriam Baglioni
|
6fbc67a959
|
using ModelConstant.ORCID and removing not used constants
|
2020-12-09 17:10:20 +01:00 |
Claudio Atzori
|
3c5ce1dada
|
code formatting
|
2020-12-09 17:07:20 +01:00 |
Miriam Baglioni
|
212b52614f
|
added graph mapper versus community result without context and project in common to be used for the doiboost mapping
|
2020-12-09 16:59:02 +01:00 |
Claudio Atzori
|
491ad24750
|
introduced filtering for DOIs in graph cleaning workflow
|
2020-12-09 09:10:33 +01:00 |
Claudio Atzori
|
943b961cf6
|
introduced PidBlacklist
|
2020-12-02 09:30:34 +01:00 |
Claudio Atzori
|
893ac4a77b
|
GenerateEntitiesApplication can be configured to hash the id value or not
|
2020-12-02 09:30:06 +01:00 |
Claudio Atzori
|
349e7246aa
|
do not consider NCID, GBIF as PIDs candidate for the ID creation
|
2020-11-30 16:52:40 +01:00 |
Claudio Atzori
|
2c407e775e
|
GenerateEntitiesApplication can be configured to hash the id value or not
|
2020-11-30 12:00:38 +01:00 |
Claudio Atzori
|
758d27745d
|
cleaning tab characters from text fields
|
2020-11-27 16:07:24 +01:00 |
Claudio Atzori
|
fa66e5b6b8
|
ResultTypeComparator gives priority to Records collectedfrom Crossref
|
2020-11-26 13:09:19 +01:00 |
Claudio Atzori
|
d0d5525d40
|
minor changes
|
2020-11-26 11:04:17 +01:00 |
Miriam Baglioni
|
66c0e3e574
|
changed because of #61 (comment)
|
2020-11-25 17:52:17 +01:00 |
Claudio Atzori
|
1372a4d1bf
|
fixed merging method
|
2020-11-25 16:05:51 +01:00 |
Claudio Atzori
|
dfd6205b95
|
Consistency graph workflow merges all the entities by ID
|
2020-11-25 14:55:32 +01:00 |
Claudio Atzori
|
e1a1bb3ee4
|
moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects
|
2020-11-24 18:34:03 +01:00 |
Claudio Atzori
|
e43ab07af6
|
code formatting
|
2020-11-24 14:41:39 +01:00 |
Miriam Baglioni
|
73dbb79602
|
removed the checl for the community name in the common version on MakeTar
|
2020-11-24 14:36:15 +01:00 |
Claudio Atzori
|
c016cc050a
|
IdentifierFactory: in case a record provides more than one pid of the same type, the the lexicographically lower value is chosen as best pick
|
2020-11-23 19:16:40 +01:00 |
Claudio Atzori
|
3f34757c63
|
merged from master
|
2020-11-19 14:34:54 +01:00 |
Claudio Atzori
|
2bed29eb09
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:12 +01:00 |
Claudio Atzori
|
13e36a4da0
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:02 +01:00 |
Claudio Atzori
|
9b0fb9e958
|
merged from master
|
2020-11-12 09:27:12 +01:00 |
Miriam Baglioni
|
f8e9bda24c
|
merge branch with master
|
2020-11-05 16:31:18 +01:00 |
Miriam Baglioni
|
7ebdfacee9
|
removed commented code and added documentation to new method
|
2020-11-05 16:30:36 +01:00 |
Claudio Atzori
|
4625b7486e
|
code formatting
|
2020-11-04 18:12:43 +01:00 |
Claudio Atzori
|
e5da4ee9b1
|
dedup workflow using the common PidComparator
|
2020-11-04 15:02:02 +01:00 |
Claudio Atzori
|
ea2a0ea949
|
IdentifierFactory considers only DOIs matching a given regex
|
2020-11-03 18:43:37 +01:00 |
Miriam Baglioni
|
d4382b54df
|
moved the tar archive with maz size on common module
|
2020-11-03 16:54:50 +01:00 |
Claudio Atzori
|
86d6fbe95b
|
refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong
|
2020-11-03 12:19:46 +01:00 |
Claudio Atzori
|
3fcd669e99
|
result merge operation leverage on custom ResultTypeComparator in the aggregator graph construction
|
2020-11-03 10:53:23 +01:00 |
Claudio Atzori
|
78c3c1b62b
|
exclude pid values set to 'none'
|
2020-11-02 14:25:26 +01:00 |
Claudio Atzori
|
09e44dabff
|
Merge branch 'master' into stable_ids
|
2020-11-02 12:16:01 +01:00 |
Miriam Baglioni
|
10d8bbada8
|
changed deprecated method with non deprecated versioen
|
2020-10-30 14:10:10 +01:00 |
Claudio Atzori
|
58f28296ea
|
ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values
|
2020-10-30 10:56:42 +01:00 |
Miriam Baglioni
|
4cf4454341
|
changed from deprecated method to new one
|
2020-10-27 17:46:19 +01:00 |
Miriam Baglioni
|
3582eba565
|
-
|
2020-10-27 17:31:33 +01:00 |
Miriam Baglioni
|
cc68855a1e
|
merge upstream
|
2020-10-27 15:54:16 +01:00 |
Miriam Baglioni
|
1cb60aede4
|
added connection timeout and socket timeout 600 sec
|
2020-10-27 15:53:02 +01:00 |
sandro
|
3a81a940b7
|
solved bug on merge publication
|
2020-10-21 22:41:55 +02:00 |
Claudio Atzori
|
c188868450
|
Merge branch 'master' into stable_ids
|
2020-10-16 12:06:23 +02:00 |
Sandro La Bruzzo
|
734934e2eb
|
fixed error on empty intersection with publication and relation on export to OAF
|
2020-10-08 17:29:29 +02:00 |
Sandro La Bruzzo
|
eec418cd26
|
moved AuthoreMerger into dhp-common
|
2020-10-08 10:33:55 +02:00 |
Claudio Atzori
|
8958f20813
|
code formatting
|
2020-10-07 13:14:31 +02:00 |
Claudio Atzori
|
1abcabb6e6
|
WIP stable ids: IdentifierFactory & unit test
|
2020-10-06 18:55:23 +02:00 |
Claudio Atzori
|
6ce340bd3d
|
WIP stable ids: IdentifierFactory
|
2020-10-06 15:44:53 +02:00 |
Claudio Atzori
|
49ae3450a9
|
code formatting
|
2020-10-02 09:43:24 +02:00 |
Claudio Atzori
|
1c44182dea
|
minor changes
|
2020-10-02 09:41:34 +02:00 |
Claudio Atzori
|
8a523474b7
|
code formatting
|
2020-09-07 11:40:16 +02:00 |
Miriam Baglioni
|
ecd2081f84
|
refactoring
|
2020-08-11 14:17:31 +02:00 |
Miriam Baglioni
|
545ea9f77e
|
moved in common. Zenodo response model and APIClient to deposit in Zenodo
|
2020-08-07 16:44:51 +02:00 |
Claudio Atzori
|
93052ae384
|
WIP: set the connect & request timeout for BindingProvider service implementation
|
2020-06-25 16:16:02 +02:00 |
Claudio Atzori
|
0d52816244
|
WIP: graph cleaner implementation
|
2020-06-13 13:06:04 +02:00 |
Claudio Atzori
|
463489f59f
|
code formatting
|
2020-06-12 12:03:25 +02:00 |
miconis
|
fa8c5bcd39
|
javadoc for the PacePerson class and implementation of a unit test
|
2020-06-11 12:19:32 +02:00 |
Miriam Baglioni
|
8f6ce970f9
|
moved PacePerson to dhp-common to avoid conflict in dependency with graph-mapper
|
2020-05-25 10:25:55 +02:00 |
Miriam Baglioni
|
4c94231cad
|
merge with master fork
|
2020-05-08 12:25:57 +02:00 |
Miriam Baglioni
|
31ea05297d
|
moved the DbClient to common and added needed dependency to pom
|
2020-05-04 12:22:28 +02:00 |
Claudio Atzori
|
439c6255a2
|
cleanup
|
2020-04-29 19:09:07 +02:00 |
Miriam Baglioni
|
f7695e833c
|
resolved conflicts
|
2020-04-29 11:41:31 +02:00 |
Claudio Atzori
|
6f5b899038
|
reformatted code according to the updated style descriptor
|
2020-04-28 11:23:29 +02:00 |
Claudio Atzori
|
a0bdbacdae
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:52:31 +02:00 |
Claudio Atzori
|
7a3f8085f7
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:45:40 +02:00 |
Claudio Atzori
|
ad7a131b18
|
introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project
|
2020-04-18 12:42:58 +02:00 |
Claudio Atzori
|
038ac7afd7
|
relation consistency workflow separated from dedup scan and creation of CCs
|
2020-04-17 13:12:44 +02:00 |
Claudio Atzori
|
47f3d9b757
|
unit test for GraphHiveImporterJob
|
2020-04-08 13:24:43 +02:00 |
Claudio Atzori
|
3d1b637cab
|
dataset based provision WIP
|
2020-04-04 14:03:43 +02:00 |
Sandro La Bruzzo
|
0cd022ad6a
|
merge with master
|
2020-03-26 14:08:29 +01:00 |
Sandro La Bruzzo
|
addaaa091f
|
migrate relation from RDD to Dataset
|
2020-03-13 09:13:20 +01:00 |
Sandro La Bruzzo
|
b021b8a2e1
|
Added index wf
|
2020-02-24 10:15:55 +01:00 |
Claudio Atzori
|
33185fd0b7
|
ISLookupClientFactory moved in dhp-common
|
2020-02-19 16:56:38 +01:00 |
Sandro La Bruzzo
|
2b8675462f
|
refactoring code
|
2020-02-19 10:07:08 +01:00 |
Claudio Atzori
|
56d1810a66
|
working procedure for records indexing using Spark, via lib com.lucidworks.spark:spark-solr
|
2020-02-14 12:28:52 +01:00 |
Claudio Atzori
|
956da2f923
|
added Saxon-HE extension functions and Transformer factory class
|
2020-02-13 16:49:45 +01:00 |
Sandro La Bruzzo
|
19a80e4638
|
implemented workfow for aggregation and generation of infospace graph
|
2020-01-24 09:58:55 +01:00 |
Sandro La Bruzzo
|
abd9034da0
|
implemented DedupRecord factory with the merge of publications
|
2019-12-11 15:43:24 +01:00 |
Claudio Atzori
|
c8bb81cd9a
|
align dependencies with IIS cluster
|
2019-10-29 18:10:20 +01:00 |
Sandro La Bruzzo
|
5a8a323f2a
|
dhp-collection-worker integrated in dhp-workflows
|
2019-10-24 11:36:59 +02:00 |
Sandro La Bruzzo
|
bbb87d0e3d
|
implemented saxonHE on transformation spark job
|
2019-10-10 11:33:51 +02:00 |
Sandro La Bruzzo
|
4b8c7c279d
|
Added documentation on a class, and reused ArgumetApplicationParser on dhp-aggregation
|
2019-10-07 17:02:53 +02:00 |
Sandro La Bruzzo
|
a423a6ebfd
|
Created a generic Argument parser to be used in all modules
|
2019-10-03 12:22:44 +02:00 |
Sandro La Bruzzo
|
53ec9bccca
|
changed the implemetation of RabitMQ Comunication
|
2019-04-16 12:28:01 +02:00 |
Sandro La Bruzzo
|
403c13eebf
|
Implemented message manager, Fixed bug on collection worker, implemented Collecion and Transform spark job
|
2019-04-11 15:39:29 +02:00 |
Sandro La Bruzzo
|
9294851a6c
|
implemented comunication layer using rabbitMq between oozie node and Dnet
|
2019-04-05 12:19:25 +02:00 |
Sandro La Bruzzo
|
3f4ba71bbd
|
resolved conflicts
|
2019-04-03 16:12:57 +02:00 |
Sandro La Bruzzo
|
ded6aef5e1
|
moved collector worker
|
2019-04-03 16:05:16 +02:00 |
enricoottonello
|
2f79eb930a
|
added apidescriptor
|
2019-04-03 16:03:44 +02:00 |
enricoottonello
|
b316467608
|
added common module
|
2019-04-03 10:53:54 +02:00 |
luosolo
|
1eb0281b38
|
refactored structure of the project
|
2019-03-13 14:43:20 +01:00 |
Claudio Atzori
|
f072ed91b2
|
first commit
|
2018-01-16 14:21:13 +01:00 |