Claudio Atzori
eb6acfbabc
[cleaning] removing non parsable relation.validationDate(s)
2021-05-28 10:50:44 +02:00
Claudio Atzori
ac3d090e9e
bumped dhp-schemas dependency version
2021-05-27 17:31:12 +02:00
Claudio Atzori
c3d92247d3
bumped dhp-schemas dependency version
2021-05-27 15:10:51 +02:00
Claudio Atzori
4f58418184
depending on dhp-schemas:2.4.7 (release)
2021-05-24 10:32:48 +02:00
Claudio Atzori
0358ae16ce
depending on the latest dhp-schema version
2021-05-14 11:28:33 +02:00
Claudio Atzori
d1cbee8413
imported methods from CleaningFunctions, defined in GraphCleaningFunctions
2021-05-10 16:43:39 +02:00
Claudio Atzori
3797543600
MDStoreManager model classes moved in dhp-schemas
2021-05-10 14:32:05 +02:00
Claudio Atzori
923d19ea8e
mdstore read lock/unlock when bulk copying records from mongodb to hdfs
2021-05-04 18:06:21 +02:00
Claudio Atzori
5cc3e6d61c
bumped pace-core dependency version
2021-05-03 16:40:50 +02:00
Claudio Atzori
91e7220f20
cleaned up workflow for actionset migration, adjusted dnet|cnr* dependency versions
2021-04-29 10:09:52 +02:00
Claudio Atzori
233d849f90
added dnet45-bootstrap-snapshot and dnet45-bootstrap-release repositories
2021-04-27 12:03:40 +02:00
Claudio Atzori
4028176559
enabled snapshots from dnet45-snapshots repository
2021-04-27 11:37:32 +02:00
Claudio Atzori
27ab8a704d
adjusted poms to align with the external dhp-schema module
2021-04-27 10:12:27 +02:00
Claudio Atzori
c2bb03c8b5
depending on external dhp-schemas module
2021-04-23 17:57:35 +02:00
miconis
2709d08fc2
Merge branch 'stable_ids' into openorgswf
2021-03-29 16:39:07 +02:00
miconis
28c1cdd132
merged stable_ids into openorgswf
2021-03-25 10:44:49 +01:00
Sandro La Bruzzo
c73072079d
fix conflicts
2021-03-22 16:36:31 +01:00
Claudio Atzori
acbe3119a4
RestCollectorPlugin imported from dne45
2021-03-08 09:44:09 +01:00
Claudio Atzori
fa7930d2e2
merging contributions from PR#97
2021-03-05 15:45:28 +01:00
miconis
1a85020572
bug fix in graph-mapper, changes in the implementation of the openorgs wf to create relations and populate openorgs db
2021-02-26 10:19:28 +01:00
Claudio Atzori
72c57b28fa
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
2021-02-04 14:08:18 +01:00
Claudio Atzori
d62ea1490d
cleaned up RabbitMQ stuff
2021-02-02 10:53:19 +01:00
Michele Artini
b9d90e95b8
Added eventId to ShortEventMessage
2021-01-14 14:32:31 +01:00
Claudio Atzori
197f286fa4
removed duplicated dependency (org.apache.httpcomponents:httpclent
2020-12-07 21:52:17 +01:00
Enrico Ottonello
2b0c9bbb7e
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-17 18:24:34 +01:00
Claudio Atzori
628ca54dd3
disable old maven repository URLs
2020-11-17 12:26:16 +01:00
Enrico Ottonello
c796adae24
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-16 11:57:19 +01:00
Claudio Atzori
2facfefc19
updated maven repository URL
2020-11-13 15:38:40 +01:00
Enrico Ottonello
6bc7dbeca7
first version of dataset successful generated from orcid dump 2020
2020-11-06 13:47:50 +01:00
Enrico Ottonello
9818e74a70
added dependency version in main pom.xml for orcid no doi
2020-10-22 16:38:00 +02:00
Enrico Ottonello
b0290dbcb7
moved all dependencies version to main pom.xml
2020-10-22 16:20:46 +02:00
Miriam Baglioni
ae08b3c0dd
merge branch with master
2020-10-05 11:35:55 +02:00
Claudio Atzori
4fddd18403
updating to dnet-pace-core:4.0.5
...
- fixed error in the treeprocessor. it used th=-1 as default value, now it use th=1 5021e5048f
- fixed error in the block processor: entities with orderField=null were not considered 9e8ea8f6ee
2020-10-02 12:37:25 +02:00
Miriam Baglioni
5ef03e5971
added the dependencies from dhp-aggregation for h2020classification
2020-10-01 15:44:40 +02:00
Enrico Ottonello
fefbcfb106
dependency version moved to main pom (PR review)
2020-09-22 10:20:25 +02:00
Michele Artini
51321c2701
partition of events by opedoarId
2020-09-17 11:38:07 +02:00
Miriam Baglioni
02a4986e7b
Applying changed from code reviews D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment)
2020-08-13 11:53:01 +02:00
Claudio Atzori
3a11a387a9
data provision workflow enhancement: added nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed
2020-08-03 14:28:08 +02:00
Claudio Atzori
105176105c
updated dnet-pace-core dependency to version 4.0.4 to include the latest clustering function
2020-07-20 09:59:47 +02:00
Claudio Atzori
4b9fb2ffb8
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-07-15 11:26:04 +02:00
Claudio Atzori
5033c25587
code formatting
2020-07-15 11:26:00 +02:00
Michele Artini
262c29463e
relations with multiple datasources
2020-07-15 09:18:40 +02:00
Michele Artini
3635d05061
poms
2020-07-13 15:52:23 +02:00
Claudio Atzori
c3d67f709a
adjusted dedup configuration for result entities: using new wordssuffixprefix clustering function, removed ngrampairs, adjusted queueMaxSize (800) and slidingWindowSize (80)
2020-07-02 17:35:22 +02:00
Claudio Atzori
9cd27183b6
[maven-release-plugin] prepare for next development iteration
2020-06-22 11:27:44 +02:00
Claudio Atzori
1e3dab0631
[maven-release-plugin] prepare release dhp-1.2.3
2020-06-22 11:27:39 +02:00
Claudio Atzori
c4d9f1837f
[maven-release-plugin] prepare for next development iteration
2020-06-12 12:21:08 +02:00
Claudio Atzori
f0746a7605
[maven-release-plugin] prepare release dhp-1.2.2
2020-06-12 12:21:03 +02:00
Claudio Atzori
c77fc68484
restored Saxon-HE dependency definition
2020-06-10 14:49:25 +02:00
Sandro La Bruzzo
13815d5d13
improvement DOIBoost
2020-06-01 17:52:12 +02:00
Claudio Atzori
7582532e73
[maven-release-plugin] prepare for next development iteration
2020-05-25 19:48:18 +02:00
Claudio Atzori
01c2e93395
[maven-release-plugin] prepare release dhp-1.2.1
2020-05-25 19:48:14 +02:00
Claudio Atzori
60c40618d3
[maven-release-plugin] prepare for next development iteration
2020-05-11 10:17:14 +02:00
Claudio Atzori
c267d958d5
[maven-release-plugin] prepare release dhp-1.2.0
2020-05-11 10:17:10 +02:00
Claudio Atzori
42f1a2bf94
bumped project version to 1.2.0-SNAPSHOT
2020-05-11 10:05:57 +02:00
Claudio Atzori
0ccc864ad9
[maven-release-plugin] prepare for next development iteration
2020-05-08 17:01:31 +02:00
Claudio Atzori
6e47c724c6
[maven-release-plugin] prepare release dhp-1.1.7
2020-05-08 17:01:27 +02:00
Claudio Atzori
77ac995770
cleaned up poms, added descriptions
2020-04-29 18:44:17 +02:00
Claudio Atzori
64d790a266
updated maven plugin dependencies
2020-04-29 16:56:18 +02:00
Claudio Atzori
fe81f674ec
updated maven-javadoc-plugin to v3.2.0, disabled doclint to avoid compilation to fail in case of incomplete javadoc tags
2020-04-29 16:19:57 +02:00
Claudio Atzori
e6d68d1364
added customised style for automatic code formatting, introduced automatic import sorting plugin net.revelc.code:impsort-maven-plugin
2020-04-28 11:09:50 +02:00
Claudio Atzori
d3fd05e3c5
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
2020-04-27 14:52:23 +02:00
Claudio Atzori
7a3f8085f7
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
2020-04-27 14:45:40 +02:00
Claudio Atzori
fad94c2155
updated dependency dnet-pace-core to version 4.0.1 to include 7bc00a3f5f
2020-04-24 16:47:10 +02:00
Claudio Atzori
ba4339f142
excluded org.apache.hadoop:hadoop-common from the dnet-actionmanager-common dependency to avoid multiple transitive jaxb-impl versions to conflict when instantiating the ISLookup client stub
2020-04-22 14:23:09 +02:00
Claudio Atzori
ad7a131b18
introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin , applied to each java class in the project
2020-04-18 12:42:58 +02:00
Claudio Atzori
82e8341f50
reorganizing parameter names in the provision workflow
2020-04-14 15:54:41 +02:00
Claudio Atzori
6b5f9ca9cb
raw graph creation workflow moved under dhp-graph-mapper, claims integration is included
2020-04-10 17:53:07 +02:00
Claudio Atzori
377e1ba840
[maven-release-plugin] prepare for next development iteration
2020-03-30 20:06:00 +02:00
Claudio Atzori
76d9315129
[maven-release-plugin] prepare release dhp-1.1.6
2020-03-30 20:05:56 +02:00
Claudio Atzori
77c4294924
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-03-26 18:26:52 +01:00
Claudio Atzori
43cbcda7ef
unit test for SparkGraphImporterJob
2020-03-26 18:26:40 +01:00
Sandro La Bruzzo
0cd022ad6a
merge with master
2020-03-26 14:08:29 +01:00
Claudio Atzori
a226198a13
WIP adopting junit5
2020-03-25 16:47:39 +01:00
Claudio Atzori
6cb0a9bff0
dedup wf directory structure aligned with project commons
2020-03-20 16:48:14 +01:00
miconis
f32eae5ce9
implementation of the spark action for the simrel creation
2020-03-18 14:27:49 +01:00
Claudio Atzori
9c84e21b87
added workflow to migrate latest version of each actionset content from DM to OCEAN cluster, mapping the targetValues from the old protobuf data model to the dhp.OAF datamodel
2020-03-13 15:56:52 +01:00
Sandro La Bruzzo
b021b8a2e1
Added index wf
2020-02-24 10:15:55 +01:00
Michele Artini
4c94e74a84
Added a missing dependency
2020-02-20 11:43:32 +01:00
Claudio Atzori
33185fd0b7
ISLookupClientFactory moved in dhp-common
2020-02-19 16:56:38 +01:00
Sandro La Bruzzo
2b8675462f
refactoring code
2020-02-19 10:07:08 +01:00
Claudio Atzori
1b18fd4d54
sync with master branch
2020-02-17 13:49:46 +01:00
Sandro La Bruzzo
76ee85141a
added oozie job for DNET migration and implemented Spark job for extracting entities
2020-02-17 12:31:44 +01:00
Michele Artini
176c5606bd
aligned with origin/master, aligned model and mapping
2020-02-17 10:40:53 +01:00
Claudio Atzori
56d1810a66
working procedure for records indexing using Spark, via lib com.lucidworks.spark:spark-solr
2020-02-14 12:28:52 +01:00
Claudio Atzori
1ee1baa8c0
Merge branch 'master' into provision_indexing
2020-02-13 18:17:07 +01:00
Claudio Atzori
a3d0b57b25
[maven-release-plugin] prepare for next development iteration
2020-02-13 18:11:33 +01:00
Claudio Atzori
6ed9a15bc8
[maven-release-plugin] prepare release dhp-1.1.5
2020-02-13 18:11:31 +01:00
Claudio Atzori
49e648f7c3
bumped version
2020-02-13 18:09:31 +01:00
Claudio Atzori
11cfd6bd9a
integrated changes from master branch
2020-02-13 17:27:07 +01:00
Claudio Atzori
1fee6e2b7e
implemented XML records construction and serialization, indexing WIP
2020-02-13 16:53:27 +01:00
Sandro La Bruzzo
7f11d06a1f
upgraded version of dnet-pace-core in pom.xml
2020-02-10 12:58:59 +01:00
Sandro La Bruzzo
19a80e4638
implemented workfow for aggregation and generation of infospace graph
2020-01-24 09:58:55 +01:00
Michele Artini
b35c59eb42
partial implementation of entities from db
2020-01-20 16:04:19 +01:00
Michele Artini
7229fecbcf
fix warnings in poms
2019-12-20 13:41:08 +01:00
Sandro La Bruzzo
abd9034da0
implemented DedupRecord factory with the merge of publications
2019-12-11 15:43:24 +01:00
Sandro La Bruzzo
cc63706347
Implemented deduplication on spark
2019-12-06 13:38:00 +01:00
Claudio Atzori
7fe6835b47
[maven-release-plugin] prepare for next development iteration
2019-11-07 17:39:30 +01:00
Claudio Atzori
58918967d9
[maven-release-plugin] prepare release dhp-1.0.4
2019-11-07 17:39:27 +01:00
Claudio Atzori
a52d5bde4f
simplified import procedure, maps the infospace as hive tables
2019-11-06 17:45:52 +01:00