Claudio Atzori
|
7ba0f44d05
|
WIP
|
2020-01-30 18:21:07 +01:00 |
Claudio Atzori
|
49ef2f4eb1
|
removed input parameter specification, SparkXmlRecordBuilderJob doesn't need hive
|
2020-01-30 18:20:26 +01:00 |
Claudio Atzori
|
b5e1e2e5b2
|
reintegrated changes from fcbc4ccd70
|
2020-01-30 18:11:04 +01:00 |
Claudio Atzori
|
7bacd6812e
|
Merge branch 'provision_indexing' of https://code-repo.d4science.org/D-Net/dnet-hadoop into HEAD
Conflicts:
dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/GraphJoiner.java
dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/MappingUtils.java
dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/RelatedEntity.java
dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/SparkXmlRecordBuilderJob.java
|
2020-01-30 17:59:46 +01:00 |
Claudio Atzori
|
b2691a3b0a
|
save adjacency list as JoinedEntity
|
2020-01-30 17:46:29 +01:00 |
Claudio Atzori
|
1ecca69f49
|
added annotation to ignore method during the serialization
|
2020-01-30 17:45:28 +01:00 |
Claudio Atzori
|
8c2aff99b0
|
joining entities using T x R x S, WIP: last representation based on LinkedEntity type
|
2020-01-29 15:40:33 +01:00 |
Sandro La Bruzzo
|
ad4387dd38
|
added property to gitignore
|
2020-01-27 10:56:40 +01:00 |
Sandro La Bruzzo
|
24219d1204
|
Merge branch 'master' of https://code-repo.d3science.org/D-Net/dnet-hadoop
|
2020-01-27 10:54:11 +01:00 |
Sandro La Bruzzo
|
0dff14b28e
|
added property to gitignore
|
2020-01-27 10:53:54 +01:00 |
Sandro La Bruzzo
|
19a80e4638
|
implemented workfow for aggregation and generation of infospace graph
|
2020-01-24 09:58:55 +01:00 |
Claudio Atzori
|
fcbc4ccd70
|
a bit of docs doesn't hurt
|
2020-01-24 08:43:23 +01:00 |
Claudio Atzori
|
a55f5fecc6
|
joining entities using T x R x S method with groupByKey, WIP: making target objects (T) have lower memory footprint
|
2020-01-24 08:17:53 +01:00 |
Michele Artini
|
6bfe2dc96e
|
partial implementation
|
2020-01-22 16:00:23 +01:00 |
Claudio Atzori
|
799929c1e3
|
joining entities using T x R x S method with groupByKey
|
2020-01-21 16:35:44 +01:00 |
Michele Artini
|
f6eccdde33
|
partial implementation
|
2020-01-21 14:17:05 +01:00 |
Michele Artini
|
cd114f1c3b
|
partial update
|
2020-01-21 12:32:10 +01:00 |
Michele Artini
|
b35c59eb42
|
partial implementation of entities from db
|
2020-01-20 16:04:19 +01:00 |
Sandro La Bruzzo
|
fa7504bf29
|
removed DLI stuff should be in a branch
|
2020-01-20 10:28:00 +01:00 |
Michele Artini
|
81f82b5d34
|
partial implementation of applications to migrate entities
|
2020-01-17 15:26:21 +01:00 |
Claudio Atzori
|
1cd6899480
|
merged from master
|
2020-01-17 14:25:57 +01:00 |
Claudio Atzori
|
749b0660ab
|
instance URLs must be repeatable
|
2020-01-17 14:22:15 +01:00 |
Claudio Atzori
|
63c0db4ff8
|
instance URLs must be repeatable
|
2020-01-16 15:54:53 +02:00 |
Claudio Atzori
|
97c239ee0d
|
WIP: trying to find a way to build the records for the index
|
2020-01-16 12:02:28 +02:00 |
miconis
|
4955be0197
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-01-14 15:03:44 +02:00 |
miconis
|
f61adfc2bb
|
minor changes
|
2020-01-14 15:03:27 +02:00 |
miconis
|
9bdcb02179
|
minor changes and update of the configuration for publications
|
2020-01-14 15:01:03 +02:00 |
Michele Artini
|
f7b9a7a9af
|
entity migration (partial implementation)
|
2020-01-10 15:55:23 +01:00 |
Claudio Atzori
|
731f9b64e6
|
Merge branch 'master' of michele.artini/dnet-hadoop into master
|
2019-12-20 14:22:37 +01:00 |
Michele Artini
|
7229fecbcf
|
fix warnings in poms
|
2019-12-20 13:41:08 +01:00 |
Sandro La Bruzzo
|
dd21db7036
|
fixed stuff
|
2019-12-18 16:28:22 +01:00 |
Claudio Atzori
|
7ba586d2e5
|
oozie workflow aimed to build the adjacency lists representation of the graph, needed to build the records to be indexed
|
2019-12-17 16:24:49 +01:00 |
Sandro La Bruzzo
|
76efcde4fd
|
using new branch decisionTreeDedup
|
2019-12-13 12:20:35 +01:00 |
Sandro La Bruzzo
|
b4392f9f43
|
implemented DedupRecord factory for missing entities
|
2019-12-13 09:40:02 +01:00 |
miconis
|
545e940007
|
implementation of the mergeFrom for the Datasources
|
2019-12-12 15:36:41 +01:00 |
Sandro La Bruzzo
|
39367676d7
|
implemented DedupRecord factory with the merge of project
|
2019-12-12 15:18:48 +01:00 |
Sandro La Bruzzo
|
6b45e37e22
|
implemented DedupRecord factory with the merge of organizations
|
2019-12-11 16:57:37 +01:00 |
Sandro La Bruzzo
|
abd9034da0
|
implemented DedupRecord factory with the merge of publications
|
2019-12-11 15:43:24 +01:00 |
miconis
|
4b66b471a4
|
implementation of the sorting by trust mechanism and the merge of oaf entities
|
2019-12-10 14:57:16 +01:00 |
Sandro La Bruzzo
|
cc63706347
|
Implemented deduplication on spark
|
2019-12-06 13:38:00 +01:00 |
Claudio Atzori
|
6a7bee5e43
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2019-11-14 15:43:07 +01:00 |
Claudio Atzori
|
0c4b316f82
|
align Result model with the latest OpenAIRE schema changes introduced in the protobuf model
|
2019-11-14 15:42:52 +01:00 |
Sandro La Bruzzo
|
aad0cb40b7
|
Added schema Scholexplorer
|
2019-11-14 10:34:09 +01:00 |
Claudio Atzori
|
5711e75f67
|
use ${project.version} whenever possible
|
2019-11-08 17:41:51 +01:00 |
Claudio Atzori
|
245b4cbbb3
|
removed import limit
|
2019-11-08 17:41:01 +01:00 |
Claudio Atzori
|
7fe6835b47
|
[maven-release-plugin] prepare for next development iteration
|
2019-11-07 17:39:30 +01:00 |
Claudio Atzori
|
58918967d9
|
[maven-release-plugin] prepare release dhp-1.0.4
|
2019-11-07 17:39:27 +01:00 |
Claudio Atzori
|
2243089b78
|
Author PIDs include also provenance information
|
2019-11-07 17:38:37 +01:00 |
Claudio Atzori
|
5308f05a02
|
allow to speficy the target hive DB name in the infospace import workflow
|
2019-11-07 17:38:09 +01:00 |
Claudio Atzori
|
a52d5bde4f
|
simplified import procedure, maps the infospace as hive tables
|
2019-11-06 17:45:52 +01:00 |