Claudio Atzori
|
5f45f2c77f
|
Merge branch 'master' into deduptesting
|
2020-04-18 12:46:40 +02:00 |
Claudio Atzori
|
ad7a131b18
|
introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project
|
2020-04-18 12:42:58 +02:00 |
Claudio Atzori
|
a2938dd059
|
cleanup
|
2020-04-18 12:24:22 +02:00 |
Claudio Atzori
|
9374ff03ea
|
Merge branch 'master' into deduptesting
|
2020-04-18 12:06:58 +02:00 |
Claudio Atzori
|
71813795f6
|
various refactorings on the dnet-dedup-openaire workflow
|
2020-04-18 12:06:23 +02:00 |
miconis
|
6450bb0daa
|
test for softwares dedup added. definition of orp, dataset and sw dedup configurations
|
2020-04-17 17:31:59 +02:00 |
Miriam Baglioni
|
72c63a326e
|
removed unuseful class
|
2020-04-17 17:14:51 +02:00 |
Miriam Baglioni
|
00c2ca3ee5
|
-
|
2020-04-17 17:14:25 +02:00 |
Miriam Baglioni
|
5cd092114f
|
use mergeFrom method to add the new community contexts
|
2020-04-17 17:13:18 +02:00 |
Miriam Baglioni
|
264c82f21e
|
minor
|
2020-04-17 16:54:46 +02:00 |
Miriam Baglioni
|
8c079c7a49
|
unit test for orcid to result propagation from semrel
|
2020-04-17 16:53:03 +02:00 |
Miriam Baglioni
|
eacd140a98
|
added missing parameter(s)
|
2020-04-17 16:52:30 +02:00 |
Miriam Baglioni
|
390e250faf
|
use the addPid method of the Author class to add a new pid
|
2020-04-17 16:52:02 +02:00 |
Miriam Baglioni
|
b46b080ddc
|
use mergeFrom method call to add the country(ies) instead of modify the result directly.
|
2020-04-17 16:50:54 +02:00 |
Miriam Baglioni
|
c4987dd12a
|
minor
|
2020-04-17 16:49:08 +02:00 |
Claudio Atzori
|
038ac7afd7
|
relation consistency workflow separated from dedup scan and creation of CCs
|
2020-04-17 13:12:44 +02:00 |
Claudio Atzori
|
c92bfeeaee
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-17 13:07:52 +02:00 |
Miriam Baglioni
|
adc11c97a7
|
Merge remote-tracking branch 'upstream/master'
|
2020-04-17 12:34:31 +02:00 |
Sandro La Bruzzo
|
01ea7721f3
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-17 12:12:25 +02:00 |
Sandro La Bruzzo
|
5e2fa996aa
|
fixed problem with conversion of long into string
|
2020-04-17 12:11:51 +02:00 |
miconis
|
418cf94642
|
implementation of the deletedbyinference test in propagating relations
|
2020-04-17 10:40:21 +02:00 |
Miriam Baglioni
|
5d772e5263
|
new implementation of propagation of community to result from organization that exploits the prepared info
|
2020-04-16 18:45:22 +02:00 |
Miriam Baglioni
|
fff1e5ec39
|
classes to (de)serialize the data provided in the preparation step
|
2020-04-16 18:44:43 +02:00 |
Miriam Baglioni
|
3fd9d6b02f
|
preparation phase for the propagation of community to result from organization
|
2020-04-16 18:43:55 +02:00 |
Miriam Baglioni
|
a9120164aa
|
added hive parameter and a step of reset of the working dir in the workflow
|
2020-04-16 18:42:04 +02:00 |
Miriam Baglioni
|
6afbd542ca
|
changed the save mode to avoid NegativeArraySize... error. Needed to modify also the preparationstep2
|
2020-04-16 18:40:14 +02:00 |
Miriam Baglioni
|
d60fd36046
|
changed the save method
|
2020-04-16 16:14:15 +02:00 |
Miriam Baglioni
|
951b13ac46
|
input parameters and workflow for new implementation of propagation of orcid to result from semrel and preparation phases
|
2020-04-16 16:13:10 +02:00 |
Miriam Baglioni
|
4d89f3dfed
|
removed unuseful classes
|
2020-04-16 16:11:44 +02:00 |
Miriam Baglioni
|
5e72a51f11
|
-
|
2020-04-16 16:11:20 +02:00 |
Miriam Baglioni
|
c33a593381
|
renamed
|
2020-04-16 16:09:47 +02:00 |
Miriam Baglioni
|
0e5399bf74
|
seconf phase of data preparation. Groups all the possible updates by id
|
2020-04-16 16:08:51 +02:00 |
Miriam Baglioni
|
548ba915ac
|
first phase of data preparation. For each result type (parallel) it produces the possible updates
|
2020-04-16 15:58:42 +02:00 |
Miriam Baglioni
|
243013cea3
|
to (de)serialize the association from the resultId and the list of autoritative authors with orcid to possibly propagate
|
2020-04-16 15:57:29 +02:00 |
Miriam Baglioni
|
ac3ad25b36
|
to (de)serialize needed information of the author to determine if the orcid can be passed (name, surname, fullname (?), orcid)
|
2020-04-16 15:56:33 +02:00 |
Miriam Baglioni
|
d6cd700a32
|
new implementation that exploits prepared information (the list of possible updates: resultId - possible list of orcid to be added
|
2020-04-16 15:55:25 +02:00 |
Miriam Baglioni
|
f077f22f73
|
minor
|
2020-04-16 15:54:16 +02:00 |
Miriam Baglioni
|
fd5d792e35
|
refactoring
|
2020-04-16 15:53:34 +02:00 |
Claudio Atzori
|
cb0952428e
|
Merge branch 'master' into deduptesting
|
2020-04-16 14:42:25 +02:00 |
Claudio Atzori
|
cc21bbfb1a
|
Merge branch 'deduptesting' of https://code-repo.d4science.org/D-Net/dnet-hadoop into deduptesting
|
2020-04-16 14:41:37 +02:00 |
Claudio Atzori
|
ec5dfc068d
|
added spark.sql.shuffle.partitions=3840 to dedup scan wf
|
2020-04-16 14:41:28 +02:00 |
Claudio Atzori
|
09f356b047
|
Merge pull request 'Closes #7: subdirs inside graph table dirs' (#8) from przemyslaw.jacewicz/dnet-hadoop:przemyslawjacewicz_7_distcp_configuration_fix into master
Run the code from this PR in isolation and it worked fine. Thanks!
|
2020-04-16 14:38:46 +02:00 |
Claudio Atzori
|
3437383112
|
Merge branch 'master' into deduptesting
|
2020-04-16 12:46:14 +02:00 |
miconis
|
0eccbc318b
|
Deduper class (utilities for dedup) cleaned. Useless methods removed
|
2020-04-16 12:36:37 +02:00 |
Claudio Atzori
|
76d23895e6
|
Merge branch 'deduptesting' of https://code-repo.d4science.org/D-Net/dnet-hadoop into deduptesting
|
2020-04-16 12:18:32 +02:00 |
miconis
|
6a089ec287
|
minor changes
|
2020-04-16 12:15:38 +02:00 |
Claudio Atzori
|
376efd67de
|
removed prepare statement in spark action
|
2020-04-16 12:14:16 +02:00 |
miconis
|
9b36458b6a
|
Merge branch 'deduptesting' of code-repo.d4science.org:D-Net/dnet-hadoop into deduptesting
|
2020-04-16 12:13:58 +02:00 |
miconis
|
cd4d9a148f
|
creating temporary directories in dedup test
|
2020-04-16 12:13:26 +02:00 |
Claudio Atzori
|
b39ff36c16
|
improving the wf definitions
|
2020-04-16 12:11:37 +02:00 |
Claudio Atzori
|
011b342bc9
|
trying to avoid OOM in SparkPropagateRelation
|
2020-04-16 11:13:51 +02:00 |
Miriam Baglioni
|
08227cfcbd
|
resources needed for running the test on propagation of result to organization from institutional repositories
|
2020-04-16 11:06:10 +02:00 |
Miriam Baglioni
|
a97e915c24
|
test unit for propagation of result to organization from institutional repository
|
2020-04-16 11:05:21 +02:00 |
Miriam Baglioni
|
b078710924
|
modification to the test due to the removal of unused parameters
|
2020-04-16 11:04:39 +02:00 |
Miriam Baglioni
|
a5e5c81a2c
|
input parameters and workflow definition for propagation of result to organization from institutional repositories
|
2020-04-16 11:03:41 +02:00 |
Miriam Baglioni
|
5e1bd67680
|
removed unuseful parameter
|
2020-04-16 11:02:01 +02:00 |
Miriam Baglioni
|
eaf19ce01b
|
removed unuseful class
|
2020-04-16 10:59:33 +02:00 |
Miriam Baglioni
|
7bd49abbef
|
commit to delete
|
2020-04-16 10:59:09 +02:00 |
Miriam Baglioni
|
53f418098b
|
added the isTest checkpoint
|
2020-04-16 10:53:48 +02:00 |
Miriam Baglioni
|
c28333d43f
|
minor
|
2020-04-16 10:52:50 +02:00 |
Miriam Baglioni
|
a8100baed6
|
changed the way to save the results to aviod NegativeArray... error
|
2020-04-16 10:50:09 +02:00 |
Miriam Baglioni
|
79b978ec57
|
refactoring
|
2020-04-16 10:48:41 +02:00 |
Claudio Atzori
|
069ef5eaed
|
trying to avoid OOM in SparkPropagateRelation
|
2020-04-15 21:23:21 +02:00 |
Claudio Atzori
|
8eedfefc98
|
try to introduce intermediate serialization on hdfs to avoid OOM
|
2020-04-15 18:35:35 +02:00 |
Przemysław Jacewicz
|
da019495d7
|
[dhp-actionmanager] target dir removal added for distcp actions
|
2020-04-15 17:56:57 +02:00 |
miconis
|
5689d49689
|
minor changes
|
2020-04-15 16:34:06 +02:00 |
Claudio Atzori
|
c439d0c6bb
|
PromoteActionPayloadForGraphTableJob reads directly the content pointed by the input path, adjusted promote action tests (ISLookup mock)
|
2020-04-15 16:18:33 +02:00 |
Claudio Atzori
|
ff30f99c65
|
using newline delimited json files for the raw graph materialization. Introduced contentPath parameter
|
2020-04-15 16:16:20 +02:00 |
Sandro La Bruzzo
|
3d3ac76dda
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-15 15:24:01 +02:00 |
Sandro La Bruzzo
|
74a7fac774
|
fixed problem with timestamp
|
2020-04-15 15:23:54 +02:00 |
Miriam Baglioni
|
3577219127
|
removed unuseful classes
|
2020-04-15 12:45:49 +02:00 |
Miriam Baglioni
|
964b22d418
|
modified the writing of the new relations. before: read old rels, add the new ones to them, write all the relations in new location. Now: first step of the wf copies the old relation i new location. If new relations are found, they are saved in the new location in append mode.
|
2020-04-15 12:32:01 +02:00 |
Miriam Baglioni
|
43f0590d4b
|
change in the testing because the business logic is changed.
|
2020-04-15 12:29:50 +02:00 |
Miriam Baglioni
|
473d17767c
|
new business logic for the actual propagation. It exploits previously computed information
|
2020-04-15 12:25:44 +02:00 |
Miriam Baglioni
|
6a377a7582
|
class to compute some information needed for the actual propagation
|
2020-04-15 12:25:11 +02:00 |
Miriam Baglioni
|
5a3487280d
|
classes to serialize/deserialize the prepared data
|
2020-04-15 12:24:36 +02:00 |
Miriam Baglioni
|
62b09be43c
|
added correct descritption for parameter isSparkSessionManaged
|
2020-04-15 12:23:06 +02:00 |
Miriam Baglioni
|
1859ce8902
|
minor refactoring
|
2020-04-15 12:21:31 +02:00 |
Miriam Baglioni
|
27f1d3ee8f
|
minor refactoring
|
2020-04-15 12:21:05 +02:00 |
Alessia Bardi
|
550a9f82ed
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-14 17:53:01 +02:00 |
Alessia Bardi
|
a68fae9bcb
|
now supporting openaire 4.0 compliance
|
2020-04-14 17:52:48 +02:00 |
Sandro La Bruzzo
|
c36239e693
|
fixed incremental indexing
|
2020-04-14 17:47:36 +02:00 |
Miriam Baglioni
|
3f4b579e7f
|
new workflow. It is composed of four steps. The first removes the directory where to store the results. The second copies the relation to the new locatio, the third id the preparation phase and then the actual propagation
|
2020-04-14 16:49:24 +02:00 |
Miriam Baglioni
|
ca2b40952e
|
minor changes
|
2020-04-14 16:48:02 +02:00 |
Miriam Baglioni
|
61d39e659e
|
parameters for the project2result propagation phase
|
2020-04-14 16:47:39 +02:00 |
Miriam Baglioni
|
92f19fa0a0
|
parameters for the project2result preparation phase
|
2020-04-14 16:46:57 +02:00 |
Miriam Baglioni
|
cadab9b81d
|
new implementation for result to project propagation. Use the prepared info in propagation
|
2020-04-14 16:46:07 +02:00 |
Miriam Baglioni
|
ceb1f299bf
|
minor changes
|
2020-04-14 16:45:12 +02:00 |
Claudio Atzori
|
82e8341f50
|
reorganizing parameter names in the provision workflow
|
2020-04-14 15:54:41 +02:00 |
Miriam Baglioni
|
e0038bde5b
|
Support class to serialize/deserialize the association project, set of linked results
|
2020-04-14 15:32:12 +02:00 |
Miriam Baglioni
|
c0bebb7c35
|
code to compute the prepared information used in the actual propagation step. This step will produce who files: one with potential updates (association between projects and a list of results), the other already linked entities (association between projects and the list of results already linked to them)
|
2020-04-14 15:31:26 +02:00 |
Miriam Baglioni
|
f47ee5b78e
|
directory where to store the prepared infor before actual propagation will take place
|
2020-04-14 15:29:21 +02:00 |
Miriam Baglioni
|
36cc9516d8
|
the starting relation set for testing
|
2020-04-14 15:28:34 +02:00 |
Miriam Baglioni
|
4b01dc60e6
|
test unit for result to project propagation
|
2020-04-14 15:28:00 +02:00 |
Miriam Baglioni
|
8f12292daa
|
changed the way to save the results on filesystem
|
2020-04-11 16:47:34 +02:00 |
Miriam Baglioni
|
87f802821e
|
new workflow for country propagation: it is composed of the preparation step and in the propagation. The propagation part runs in parallel on the result types
|
2020-04-11 16:40:22 +02:00 |
Miriam Baglioni
|
a562080b0b
|
parameters to be used in the prepared Job and in the actual country propagation job
|
2020-04-11 16:39:17 +02:00 |
Miriam Baglioni
|
1251ad4455
|
removed unuseful class
|
2020-04-11 16:38:13 +02:00 |
Miriam Baglioni
|
aef9b3aa90
|
new parametric implementation of country propagation. Exploits information compute before and broadcasts it to each executor
|
2020-04-11 16:36:59 +02:00 |
Miriam Baglioni
|
a2d833d5dd
|
step of data preparation before actual country propagation will take palce
|
2020-04-11 16:36:03 +02:00 |
Miriam Baglioni
|
6897c920a2
|
classes in support of new implementation of country propagation
|
2020-04-11 16:35:26 +02:00 |
Miriam Baglioni
|
85766a02d8
|
added dependency to use hive on local machine
|
2020-04-11 16:34:22 +02:00 |
Miriam Baglioni
|
79b8ea4fed
|
prepared information to be used in actual country propagation. Subset of info
|
2020-04-11 16:29:41 +02:00 |
Miriam Baglioni
|
1822476613
|
Test for country propagation
|
2020-04-11 16:28:09 +02:00 |
Miriam Baglioni
|
7783b09c5b
|
new implementation for result to project propagation. Prepare some info to be used in propagation
|
2020-04-11 16:26:23 +02:00 |
Claudio Atzori
|
6b5f9ca9cb
|
raw graph creation workflow moved under dhp-graph-mapper, claims integration is included
|
2020-04-10 17:53:07 +02:00 |
Miriam Baglioni
|
90469789b9
|
two new classes fro new implementation of project to result propagation
|
2020-04-09 13:29:01 +02:00 |
Miriam Baglioni
|
627ad58a8b
|
new wf definition
|
2020-04-09 11:33:19 +02:00 |
Miriam Baglioni
|
9c63c4840d
|
new workflow and parameters for country propagation
|
2020-04-08 19:13:42 +02:00 |
Miriam Baglioni
|
a2d309545b
|
new parametrized implementation for country propagation
|
2020-04-08 19:12:59 +02:00 |
Miriam Baglioni
|
6dfdba9ef7
|
new parametrized implementation for country propagation
|
2020-04-08 18:14:37 +02:00 |
Miriam Baglioni
|
03f7cb6402
|
new parametrized implementation for country propagation
|
2020-04-08 18:08:41 +02:00 |
Miriam Baglioni
|
df2fc4a6d7
|
Merge remote-tracking branch 'upstream/master'
|
2020-04-08 18:07:26 +02:00 |
Miriam Baglioni
|
fcfef4632f
|
input parameters for country propagation preparation job
|
2020-04-08 18:07:18 +02:00 |
miconis
|
0be2e72be5
|
further implementation of tests for the deduplication of each entity. publication dump added, empty entity files created
|
2020-04-08 18:02:30 +02:00 |
Miriam Baglioni
|
61045e84d9
|
merged conflict in pom
|
2020-04-08 14:23:30 +02:00 |
Claudio Atzori
|
47f3d9b757
|
unit test for GraphHiveImporterJob
|
2020-04-08 13:24:43 +02:00 |
Sandro La Bruzzo
|
ba9f07a6fe
|
fixed wrong test
|
2020-04-08 13:18:20 +02:00 |
Miriam Baglioni
|
540da4ab61
|
new busuness logic with prepared info before actual job run
|
2020-04-08 13:04:04 +02:00 |
Miriam Baglioni
|
8438702b3d
|
addition in propagation constants
|
2020-04-08 10:54:01 +02:00 |
Miriam Baglioni
|
2afe971816
|
new implementation for country propagatio
|
2020-04-08 10:49:09 +02:00 |
Miriam Baglioni
|
beebbcf66b
|
new config for countrypropagation
|
2020-04-08 10:31:29 +02:00 |
Claudio Atzori
|
d74e128aa6
|
Utility classes moved in dhp-common and dhp-schemas
|
2020-04-07 11:56:22 +02:00 |
Claudio Atzori
|
c57cf679ca
|
Merge branch 'provision_dataset'
|
2020-04-07 08:56:58 +02:00 |
Claudio Atzori
|
1a1a026a18
|
we do expect to find field bestaccessright already defined. No need to add it again
|
2020-04-07 08:55:33 +02:00 |
Claudio Atzori
|
fbdd18a96b
|
using dataset based relation preparation procedure
|
2020-04-07 08:54:39 +02:00 |
Claudio Atzori
|
77f59b1b10
|
dataset based provision WIP
|
2020-04-06 19:37:27 +02:00 |
Claudio Atzori
|
6177cf36fb
|
Merge pull request 'Closes #4: New action manager implementation' (#5) from przemyslaw.jacewicz/dnet-hadoop:przemyslawjacewicz_actionmanager_impl_prototype into master
Nothing more to add here. Thanks for your contribution!
|
2020-04-06 17:35:07 +02:00 |
Claudio Atzori
|
e355961997
|
dataset based provision WIP
|
2020-04-06 17:34:25 +02:00 |
miconis
|
56fbe689f0
|
implementation of the tests for each spark action
|
2020-04-06 16:30:31 +02:00 |
Claudio Atzori
|
ca345aaad3
|
dataset based provision WIP
|
2020-04-06 15:33:31 +02:00 |
Claudio Atzori
|
c8f4b95464
|
dataset based provision WIP
|
2020-04-06 08:59:58 +02:00 |
Claudio Atzori
|
eb2f5f3198
|
dataset based provision WIP
|
2020-04-04 17:41:31 +02:00 |
Claudio Atzori
|
3d1b637cab
|
dataset based provision WIP
|
2020-04-04 14:03:43 +02:00 |
miconis
|
53fd624c34
|
implemented test for sparkcreatesimrels
|
2020-04-03 18:32:25 +02:00 |
Claudio Atzori
|
24b2c9012e
|
dataset based provision WIP
|
2020-04-02 18:44:09 +02:00 |
miconis
|
a61763d149
|
structure for sparksimrel changed to be compliant with mockito testing
|
2020-04-02 18:37:53 +02:00 |
Claudio Atzori
|
daa26acc9d
|
dataset based provision WIP, fixed spark2EventLogDir
|
2020-04-02 16:15:50 +02:00 |
Przemysław Jacewicz
|
7b2a7e2417
|
[dhp-actionmanager] missing descriptions added and minor naming and formatting fixes
|
2020-04-02 11:48:40 +02:00 |
Claudio Atzori
|
9c7092416a
|
dataset based provision WIP
|
2020-04-01 19:07:30 +02:00 |
miconis
|
bfa5bc74df
|
minor changes
|
2020-04-01 19:05:48 +02:00 |
Przemysław Jacewicz
|
80cf43b9c8
|
[dhp-actionmanager] promoting workflow added
|
2020-04-01 18:51:25 +02:00 |
Przemysław Jacewicz
|
5b459bcc47
|
[dhp-actionmanager] promoting spark job added
|
2020-04-01 18:49:08 +02:00 |
miconis
|
9802bcb9fe
|
dedup testing
|
2020-04-01 18:48:31 +02:00 |
Przemysław Jacewicz
|
e21bb89dbd
|
[dhp-actionmanager] partitioning spark job added
|
2020-04-01 18:41:29 +02:00 |
Przemysław Jacewicz
|
f9f7350bb9
|
[dhp-actionmanager] common package added with utility classes supporting hadoop and spark envs
|
2020-04-01 18:39:26 +02:00 |
Przemysław Jacewicz
|
ad70c23b2e
|
[dhp-actionmanager] pom updated
|
2020-04-01 18:36:00 +02:00 |
Przemysław Jacewicz
|
4e910a78d4
|
[dhp-workflows] spark 2 connection properties added
|
2020-04-01 18:29:26 +02:00 |
Claudio Atzori
|
1402eb1fe7
|
cleanup
|
2020-04-01 15:38:50 +02:00 |
Claudio Atzori
|
7061d07727
|
ActionSets migration serialize the output as plain text files instead of SequenceFiles
|
2020-04-01 14:58:22 +02:00 |