Commit Graph

574 Commits

Author SHA1 Message Date
Claudio Atzori d3b96f102b builder pattern screws up the Parquet schema inference method, avoid using it in the bean definitions 2020-02-04 14:10:58 +01:00
Claudio Atzori ed290ca8d7 builder pattern 2020-02-03 10:35:51 +01:00
Claudio Atzori 7ba0f44d05 WIP 2020-01-30 18:21:07 +01:00
Claudio Atzori 49ef2f4eb1 removed input parameter specification, SparkXmlRecordBuilderJob doesn't need hive 2020-01-30 18:20:26 +01:00
Claudio Atzori b5e1e2e5b2 reintegrated changes from fcbc4ccd70 2020-01-30 18:11:04 +01:00
Claudio Atzori 7bacd6812e Merge branch 'provision_indexing' of https://code-repo.d4science.org/D-Net/dnet-hadoop into HEAD
 Conflicts:
	dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/GraphJoiner.java
	dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/MappingUtils.java
	dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/RelatedEntity.java
	dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/SparkXmlRecordBuilderJob.java
2020-01-30 17:59:46 +01:00
Claudio Atzori b2691a3b0a save adjacency list as JoinedEntity 2020-01-30 17:46:29 +01:00
Claudio Atzori 1ecca69f49 added annotation to ignore method during the serialization 2020-01-30 17:45:28 +01:00
Claudio Atzori 8c2aff99b0 joining entities using T x R x S, WIP: last representation based on LinkedEntity type 2020-01-29 15:40:33 +01:00
Sandro La Bruzzo ad4387dd38 added property to gitignore 2020-01-27 10:56:40 +01:00
Sandro La Bruzzo 24219d1204 Merge branch 'master' of https://code-repo.d3science.org/D-Net/dnet-hadoop 2020-01-27 10:54:11 +01:00
Sandro La Bruzzo 0dff14b28e added property to gitignore 2020-01-27 10:53:54 +01:00
Sandro La Bruzzo 19a80e4638 implemented workfow for aggregation and generation of infospace graph 2020-01-24 09:58:55 +01:00
Claudio Atzori fcbc4ccd70 a bit of docs doesn't hurt 2020-01-24 08:43:23 +01:00
Claudio Atzori a55f5fecc6 joining entities using T x R x S method with groupByKey, WIP: making target objects (T) have lower memory footprint 2020-01-24 08:17:53 +01:00
Michele Artini 6bfe2dc96e partial implementation 2020-01-22 16:00:23 +01:00
Claudio Atzori 799929c1e3 joining entities using T x R x S method with groupByKey 2020-01-21 16:35:44 +01:00
Michele Artini f6eccdde33 partial implementation 2020-01-21 14:17:05 +01:00
Michele Artini cd114f1c3b partial update 2020-01-21 12:32:10 +01:00
Michele Artini b35c59eb42 partial implementation of entities from db 2020-01-20 16:04:19 +01:00
Sandro La Bruzzo fa7504bf29 removed DLI stuff should be in a branch 2020-01-20 10:28:00 +01:00
Michele Artini 81f82b5d34 partial implementation of applications to migrate entities 2020-01-17 15:26:21 +01:00
Claudio Atzori 1cd6899480 merged from master 2020-01-17 14:25:57 +01:00
Claudio Atzori 749b0660ab instance URLs must be repeatable 2020-01-17 14:22:15 +01:00
Claudio Atzori 63c0db4ff8 instance URLs must be repeatable 2020-01-16 15:54:53 +02:00
Claudio Atzori 97c239ee0d WIP: trying to find a way to build the records for the index 2020-01-16 12:02:28 +02:00
miconis 4955be0197 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-01-14 15:03:44 +02:00
miconis f61adfc2bb minor changes 2020-01-14 15:03:27 +02:00
miconis 9bdcb02179 minor changes and update of the configuration for publications 2020-01-14 15:01:03 +02:00
Michele Artini f7b9a7a9af entity migration (partial implementation) 2020-01-10 15:55:23 +01:00
Claudio Atzori 731f9b64e6 Merge branch 'master' of michele.artini/dnet-hadoop into master 2019-12-20 14:22:37 +01:00
Michele Artini 7229fecbcf fix warnings in poms 2019-12-20 13:41:08 +01:00
Sandro La Bruzzo dd21db7036 fixed stuff 2019-12-18 16:28:22 +01:00
Claudio Atzori 7ba586d2e5 oozie workflow aimed to build the adjacency lists representation of the graph, needed to build the records to be indexed 2019-12-17 16:24:49 +01:00
Sandro La Bruzzo 76efcde4fd using new branch decisionTreeDedup 2019-12-13 12:20:35 +01:00
Sandro La Bruzzo b4392f9f43 implemented DedupRecord factory for missing entities 2019-12-13 09:40:02 +01:00
miconis 545e940007 implementation of the mergeFrom for the Datasources 2019-12-12 15:36:41 +01:00
Sandro La Bruzzo 39367676d7 implemented DedupRecord factory with the merge of project 2019-12-12 15:18:48 +01:00
Sandro La Bruzzo 6b45e37e22 implemented DedupRecord factory with the merge of organizations 2019-12-11 16:57:37 +01:00
Sandro La Bruzzo abd9034da0 implemented DedupRecord factory with the merge of publications 2019-12-11 15:43:24 +01:00
miconis 4b66b471a4 implementation of the sorting by trust mechanism and the merge of oaf entities 2019-12-10 14:57:16 +01:00
Sandro La Bruzzo cc63706347 Implemented deduplication on spark 2019-12-06 13:38:00 +01:00
Claudio Atzori 6a7bee5e43 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2019-11-14 15:43:07 +01:00
Claudio Atzori 0c4b316f82 align Result model with the latest OpenAIRE schema changes introduced in the protobuf model 2019-11-14 15:42:52 +01:00
Sandro La Bruzzo aad0cb40b7 Added schema Scholexplorer 2019-11-14 10:34:09 +01:00
Claudio Atzori 5711e75f67 use ${project.version} whenever possible 2019-11-08 17:41:51 +01:00
Claudio Atzori 245b4cbbb3 removed import limit 2019-11-08 17:41:01 +01:00
Claudio Atzori 7fe6835b47 [maven-release-plugin] prepare for next development iteration 2019-11-07 17:39:30 +01:00
Claudio Atzori 58918967d9 [maven-release-plugin] prepare release dhp-1.0.4 2019-11-07 17:39:27 +01:00
Claudio Atzori 2243089b78 Author PIDs include also provenance information 2019-11-07 17:38:37 +01:00