Commit Graph

4742 Commits

Author SHA1 Message Date
Sandro La Bruzzo ad4387dd38 added property to gitignore 2020-01-27 10:56:40 +01:00
Sandro La Bruzzo 24219d1204 Merge branch 'master' of https://code-repo.d3science.org/D-Net/dnet-hadoop 2020-01-27 10:54:11 +01:00
Sandro La Bruzzo 0dff14b28e added property to gitignore 2020-01-27 10:53:54 +01:00
miconis 5c8f6febee minor changes in comparators 2020-01-24 10:01:11 +01:00
Sandro La Bruzzo 19a80e4638 implemented workfow for aggregation and generation of infospace graph 2020-01-24 09:58:55 +01:00
Claudio Atzori fcbc4ccd70 a bit of docs doesn't hurt 2020-01-24 08:43:23 +01:00
Claudio Atzori a55f5fecc6 joining entities using T x R x S method with groupByKey, WIP: making target objects (T) have lower memory footprint 2020-01-24 08:17:53 +01:00
Michele Artini 6bfe2dc96e partial implementation 2020-01-22 16:00:23 +01:00
Claudio Atzori 799929c1e3 joining entities using T x R x S method with groupByKey 2020-01-21 16:35:44 +01:00
Michele Artini f6eccdde33 partial implementation 2020-01-21 14:17:05 +01:00
Michele Artini cd114f1c3b partial update 2020-01-21 12:32:10 +01:00
Michele Artini b35c59eb42 partial implementation of entities from db 2020-01-20 16:04:19 +01:00
Sandro La Bruzzo fa7504bf29 removed DLI stuff should be in a branch 2020-01-20 10:28:00 +01:00
Michele Artini 81f82b5d34 partial implementation of applications to migrate entities 2020-01-17 15:26:21 +01:00
Claudio Atzori 1cd6899480 merged from master 2020-01-17 14:25:57 +01:00
Claudio Atzori 749b0660ab instance URLs must be repeatable 2020-01-17 14:22:15 +01:00
Claudio Atzori 63c0db4ff8 instance URLs must be repeatable 2020-01-16 15:54:53 +02:00
Claudio Atzori 97c239ee0d WIP: trying to find a way to build the records for the index 2020-01-16 12:02:28 +02:00
miconis 4955be0197 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-01-14 15:03:44 +02:00
miconis f61adfc2bb minor changes 2020-01-14 15:03:27 +02:00
miconis 9bdcb02179 minor changes and update of the configuration for publications 2020-01-14 15:01:03 +02:00
miconis 4dce785375 update in the implementation of the tree: addition of new logic aggregations and statistics 2020-01-14 11:42:43 +02:00
Michele Artini f7b9a7a9af entity migration (partial implementation) 2020-01-10 15:55:23 +01:00
Claudio Atzori 731f9b64e6 Merge branch 'master' of michele.artini/dnet-hadoop into master 2019-12-20 14:22:37 +01:00
Michele Artini 7229fecbcf fix warnings in poms 2019-12-20 13:41:08 +01:00
Sandro La Bruzzo dd21db7036 fixed stuff 2019-12-18 16:28:22 +01:00
miconis b3748b8d77 minor changes 2019-12-18 16:20:35 +01:00
miconis b21b1b8f61 implementation of new aggregation in the tree node processing 2019-12-18 16:19:36 +01:00
miconis 20fcfe6328 implementation of new aggregation in the tree node processing 2019-12-18 16:19:26 +01:00
Sandro La Bruzzo d924f28b93 fixed wrong use of jspath 2019-12-18 09:29:44 +01:00
Claudio Atzori 7ba586d2e5 oozie workflow aimed to build the adjacency lists representation of the graph, needed to build the records to be indexed 2019-12-17 16:24:49 +01:00
miconis 84aaa65501 implementation of new json comparator and update of the publication configuration 2019-12-17 09:16:26 +01:00
Sandro La Bruzzo 76efcde4fd using new branch decisionTreeDedup 2019-12-13 12:20:35 +01:00
Sandro La Bruzzo 5c01ae4c92 merged JqMapping branch into tree2 2019-12-13 11:30:02 +01:00
Sandro La Bruzzo b4392f9f43 implemented DedupRecord factory for missing entities 2019-12-13 09:40:02 +01:00
miconis 545e940007 implementation of the mergeFrom for the Datasources 2019-12-12 15:36:41 +01:00
Sandro La Bruzzo 39367676d7 implemented DedupRecord factory with the merge of project 2019-12-12 15:18:48 +01:00
Sandro La Bruzzo 6b45e37e22 implemented DedupRecord factory with the merge of organizations 2019-12-11 16:57:37 +01:00
Sandro La Bruzzo abd9034da0 implemented DedupRecord factory with the merge of publications 2019-12-11 15:43:24 +01:00
miconis 4b66b471a4 implementation of the sorting by trust mechanism and the merge of oaf entities 2019-12-10 14:57:16 +01:00
Sandro La Bruzzo 35008fdbf9 fix stuff 2019-12-06 15:28:30 +01:00
Sandro La Bruzzo cc63706347 Implemented deduplication on spark 2019-12-06 13:38:00 +01:00
Sandro La Bruzzo 16c670a5d5 Improved deduplication 2019-12-05 14:14:25 +01:00
miconis 49f9beb4a8 implementation of romansmatch and re-implementation of the getNumber function. New terms in the translation map and update of the configuration 2019-11-28 16:54:44 +01:00
miconis f791730330 addition of one term to the translation maps in the configurations 2019-11-27 15:48:37 +01:00
miconis d2278fe358 minor change in the citymatch 2019-11-21 10:54:02 +01:00
miconis 8c0d346005 the param map has been updated: now it accepts string parameters 2019-11-21 09:37:56 +01:00
miconis ddd40540aa jarowinklernormalizedname splitted in 3 different comparators: citymatch, keywordmatch and jarowinkler. Implementation of the TreeStatistic support functions 2019-11-20 10:45:00 +01:00
Claudio Atzori 6a7bee5e43 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2019-11-14 15:43:07 +01:00
Claudio Atzori 0c4b316f82 align Result model with the latest OpenAIRE schema changes introduced in the protobuf model 2019-11-14 15:42:52 +01:00