Claudio Atzori
|
3c728aaa0c
|
trying to overcome OOM errors during duplicate scan phase
|
2020-07-08 22:39:51 +02:00 |
Claudio Atzori
|
18c555cd79
|
Merge branch 'master' into deduptesting
|
2020-07-08 22:32:01 +02:00 |
Claudio Atzori
|
4365cf41d7
|
trying to overcome OOM errors during duplicate scan phase
|
2020-07-08 22:31:46 +02:00 |
Claudio Atzori
|
67e1d222b6
|
bulk cleaning when found null or empty, sets bestaccessrights evaluating the result instances
|
2020-07-08 17:53:35 +02:00 |
Alessia Bardi
|
853e8d7987
|
test for software merge
|
2020-07-08 17:03:53 +02:00 |
Claudio Atzori
|
610d377d57
|
first implementation of the BETA & PROD graphs merge procedure
|
2020-07-08 16:54:26 +02:00 |
Claudio Atzori
|
e2ea30f89d
|
updated graph construction workflow definition: cleaning wf moved at the bottom to include cleaning of the information produced by the enrichment workflows
|
2020-07-08 12:16:24 +02:00 |
Michele Artini
|
dffa0b01a2
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-07 15:37:29 +02:00 |
Michele Artini
|
efadbdb2bc
|
fixed a bug with duplicated events
|
2020-07-07 15:37:13 +02:00 |
Claudio Atzori
|
8af8e7481a
|
code formatting
|
2020-07-07 14:23:34 +02:00 |
Claudio Atzori
|
b383ed42fa
|
pass optional parameter relationFilter to the PrepareRelationJob implementation
|
2020-07-07 14:21:28 +02:00 |
Claudio Atzori
|
911894a987
|
Merge branch 'deduptesting'
|
2020-07-07 14:20:43 +02:00 |
Michele Artini
|
edf6c6c4dc
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-03 11:48:24 +02:00 |
Michele Artini
|
04bebb708c
|
some fixes
|
2020-07-03 11:48:12 +02:00 |
Claudio Atzori
|
c3d67f709a
|
adjusted dedup configuration for result entities: using new wordssuffixprefix clustering function, removed ngrampairs, adjusted queueMaxSize (800) and slidingWindowSize (80)
|
2020-07-02 17:35:22 +02:00 |
Claudio Atzori
|
1d39f7901c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-07-02 12:45:01 +02:00 |
Claudio Atzori
|
0f77cac4b5
|
fix: deduper must use queueMaxSize instead of groupMaxSize for the block definition
|
2020-07-02 12:43:51 +02:00 |
Michele Artini
|
b413db0bff
|
white/blacklists
|
2020-07-02 12:43:03 +02:00 |
Claudio Atzori
|
d380b85246
|
unit test for the preparation of the relations
|
2020-07-02 12:42:13 +02:00 |
Claudio Atzori
|
ed1c7e5d75
|
fixed workflow for the import of the claims alone
|
2020-07-02 12:40:21 +02:00 |
Claudio Atzori
|
e4a29a4513
|
fixed workflow for the import of the claims alone
|
2020-07-02 12:36:33 +02:00 |
Michele Artini
|
3bcdfbabe9
|
list with limits
|
2020-07-01 08:42:39 +02:00 |
Michele Artini
|
59a5421c24
|
indexing, accumulators, limited lists
|
2020-06-30 16:17:09 +02:00 |
Michele Artini
|
6f13673464
|
accumulators
|
2020-06-29 16:33:32 +02:00 |
Michele Artini
|
a6ea432435
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-29 08:44:20 +02:00 |
Michele Artini
|
35ae381d28
|
all events matchers
|
2020-06-29 08:43:56 +02:00 |
Claudio Atzori
|
7817338e05
|
added test to verify the relation pre-processing
|
2020-06-26 17:58:33 +02:00 |
Claudio Atzori
|
8d59fdf34e
|
WIP: dataset based PrepareRelationsJob
|
2020-06-26 14:32:58 +02:00 |
Michele Artini
|
2393d9da2f
|
limits
|
2020-06-26 11:20:45 +02:00 |
Michele Artini
|
408165a756
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-25 15:53:35 +02:00 |
Michele Artini
|
e8fb305f18
|
compilation of event map
|
2020-06-25 15:53:20 +02:00 |
Michele Artini
|
4eb3e109d7
|
compilation of event map
|
2020-06-25 15:45:50 +02:00 |
Claudio Atzori
|
d839e88783
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-25 14:06:30 +02:00 |
Claudio Atzori
|
6f5771c1c9
|
sets author.rank when null
|
2020-06-25 14:06:21 +02:00 |
Michele Artini
|
e28033c6d8
|
some fixes
|
2020-06-25 13:01:09 +02:00 |
Claudio Atzori
|
216975c4ec
|
restored complete provision workflow
|
2020-06-25 12:55:52 +02:00 |
Claudio Atzori
|
2d77d3a388
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-25 12:54:30 +02:00 |
Claudio Atzori
|
93f627ea51
|
code formatting
|
2020-06-25 12:54:21 +02:00 |
Miriam Baglioni
|
05a99cfb61
|
change the position of value and description elements in the workflow definition
|
2020-06-25 12:36:08 +02:00 |
Claudio Atzori
|
7df2712824
|
Merge branch 'provision_indexing'
|
2020-06-25 12:22:41 +02:00 |
Claudio Atzori
|
e62333192c
|
WIP: prepare relation job
|
2020-06-25 12:22:18 +02:00 |
Claudio Atzori
|
6933ec11fb
|
WIP: prepare relation job
|
2020-06-25 11:04:12 +02:00 |
Sandro La Bruzzo
|
a6c0faac70
|
added test to verify secondary sorting
|
2020-06-25 10:48:15 +02:00 |
Claudio Atzori
|
69b0391708
|
WIP: prepare relation job
|
2020-06-25 10:19:56 +02:00 |
Michele Artini
|
abcbebcbb4
|
fixed generation of ids
|
2020-06-25 09:50:46 +02:00 |
Michele Artini
|
77d2a1b1c4
|
params to choose sql queries for beta or production
|
2020-06-25 09:28:13 +02:00 |
Claudio Atzori
|
46e76affeb
|
WIP: prepare relation job
|
2020-06-24 19:01:15 +02:00 |
Claudio Atzori
|
0e723d378b
|
added default from vocab for missing instance.refereed; remove spurious prefixes from orcid values; WIP: prepare relation job
|
2020-06-24 18:34:42 +02:00 |
Michele Artini
|
202f6e62ff
|
Splitted join wf
|
2020-06-24 15:47:06 +02:00 |
Michele Artini
|
e53dd62e87
|
minot changes
|
2020-06-24 09:24:45 +02:00 |
Michele Artini
|
8b9933b934
|
refactoring aggregators
|
2020-06-24 08:57:13 +02:00 |
Michele Artini
|
d13e3d3f68
|
fixed paths
|
2020-06-23 11:01:42 +02:00 |
Michele Artini
|
8386c6f90d
|
filter of valid resultResult relations
|
2020-06-23 10:24:15 +02:00 |
Michele Artini
|
38bb45d0b6
|
test osf:refereed
|
2020-06-23 10:14:39 +02:00 |
Michele Artini
|
c3286f4c37
|
fixed relType
|
2020-06-23 09:32:32 +02:00 |
Michele Artini
|
af2f7705fc
|
partial refactoring of some joins
|
2020-06-23 08:37:35 +02:00 |
Claudio Atzori
|
8a3bc7c183
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-22 14:12:33 +02:00 |
Claudio Atzori
|
e162ba5075
|
added dnet workflows to orchestrate the execution of graph2hive, updateSolr and updateStats oozie wfs
|
2020-06-22 14:12:28 +02:00 |
Michele Artini
|
3ce20c198e
|
reformatting
|
2020-06-22 12:14:25 +02:00 |
Michele Artini
|
ed787398b3
|
refactoring wf
|
2020-06-22 11:45:14 +02:00 |
Claudio Atzori
|
9cd27183b6
|
[maven-release-plugin] prepare for next development iteration
|
2020-06-22 11:27:44 +02:00 |
Claudio Atzori
|
1e3dab0631
|
[maven-release-plugin] prepare release dhp-1.2.3
|
2020-06-22 11:27:39 +02:00 |
Claudio Atzori
|
961a0d0b49
|
[actionset promotion] log debugging info in case of error in the action payload extraction or parsing the data
|
2020-06-22 10:20:45 +02:00 |
Claudio Atzori
|
5e8b922962
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-22 09:50:47 +02:00 |
Claudio Atzori
|
7d416f08d8
|
graph cleaning workflow: set hostedby to unknown repository when defined as NULL
|
2020-06-22 09:50:43 +02:00 |
Michele Artini
|
16c7a18435
|
refactoring
|
2020-06-22 08:51:31 +02:00 |
Michele Artini
|
f9fc64ffaf
|
âÃMerge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-19 15:24:43 +02:00 |
Michele Artini
|
d88fe0ac84
|
join methods
|
2020-06-19 15:24:30 +02:00 |
Sandro La Bruzzo
|
464eeeec87
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-19 15:11:53 +02:00 |
Sandro La Bruzzo
|
1681de672d
|
updated mapping scholexplorer to OAF
|
2020-06-19 15:11:46 +02:00 |
Michele Artini
|
4822747313
|
some fixes
|
2020-06-19 13:53:56 +02:00 |
Michele Artini
|
834f139e6e
|
fixed some NPE
|
2020-06-19 12:33:29 +02:00 |
Claudio Atzori
|
d0ac7514b2
|
cleaning workflow to include cleaning of default values
|
2020-06-18 19:37:25 +02:00 |
Michele Artini
|
52f62d5d8c
|
events
|
2020-06-18 14:49:13 +02:00 |
Michele Artini
|
61634fbfe0
|
removed kryo encoding
|
2020-06-18 14:09:58 +02:00 |
Michele Artini
|
8d2b199dd2
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-18 13:15:34 +02:00 |
Michele Artini
|
e659b02e6b
|
some wf fixing
|
2020-06-18 13:15:13 +02:00 |
Michele Artini
|
9a847b4557
|
some wf fixing
|
2020-06-18 13:14:10 +02:00 |
Sandro La Bruzzo
|
9bf67f5de1
|
resolved conflicts
|
2020-06-17 09:15:43 +02:00 |
Sandro La Bruzzo
|
1d4275acc4
|
implemented first version of exportation of Scholexplorer into ActionSet
|
2020-06-17 09:10:38 +02:00 |
miconis
|
5233b15265
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-16 18:31:19 +02:00 |
miconis
|
11b77b9f4e
|
json dumps for entity merge test modified to fit the new model. title merge adjusted to fix the error
|
2020-06-16 18:31:11 +02:00 |
Claudio Atzori
|
64f02de5d3
|
updated workflow definition to include the cleaning step
|
2020-06-16 17:48:51 +02:00 |
Claudio Atzori
|
306669209f
|
code formatting
|
2020-06-16 16:54:44 +02:00 |
Claudio Atzori
|
1bc1d15eaf
|
stubbing for mock datasource.identities must be typed as array
|
2020-06-16 16:54:28 +02:00 |
Claudio Atzori
|
631fef12a7
|
Merge branch 'master' into dhp_oaf_model
|
2020-06-16 16:11:19 +02:00 |
Michele Artini
|
9e2c23e391
|
partial refactoring
|
2020-06-16 15:55:42 +02:00 |
Michele Artini
|
113c9b1de0
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-16 15:53:39 +02:00 |
Michele Artini
|
76ea7607f7
|
partial refactoring
|
2020-06-16 15:53:13 +02:00 |
Claudio Atzori
|
603b1bd0bb
|
Merge branch 'master' into dhp_oaf_model
|
2020-06-16 15:43:59 +02:00 |
Claudio Atzori
|
5441f01586
|
Merge pull request 'missing landingPage urls in instances' (#22) from instances-with-landing-page into master
Looks good, thanks!
|
2020-06-16 15:32:44 +02:00 |
Claudio Atzori
|
89859111ee
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-16 15:28:29 +02:00 |
Claudio Atzori
|
4ec262db53
|
included externalreference(s) in the result view on the Hive graph DB
|
2020-06-16 15:28:20 +02:00 |
Michele Artini
|
8a4f84f8c0
|
refactoring
|
2020-06-16 12:34:13 +02:00 |
Claudio Atzori
|
2a4f65795f
|
WIP: graph cleaner implementation
|
2020-06-15 18:32:24 +02:00 |
Claudio Atzori
|
c15c8c0ad0
|
map datasource identities (including piwik ids) as original IDs
|
2020-06-15 16:07:30 +02:00 |
Claudio Atzori
|
0d52816244
|
WIP: graph cleaner implementation
|
2020-06-13 13:06:04 +02:00 |
Claudio Atzori
|
bed65a1be6
|
WIP: graph cleaner implementation
|
2020-06-12 18:25:47 +02:00 |
Claudio Atzori
|
c4d9f1837f
|
[maven-release-plugin] prepare for next development iteration
|
2020-06-12 12:21:08 +02:00 |
Claudio Atzori
|
f0746a7605
|
[maven-release-plugin] prepare release dhp-1.2.2
|
2020-06-12 12:21:03 +02:00 |