Claudio Atzori
|
daa26acc9d
|
dataset based provision WIP, fixed spark2EventLogDir
|
2020-04-02 16:15:50 +02:00 |
Przemysław Jacewicz
|
7b2a7e2417
|
[dhp-actionmanager] missing descriptions added and minor naming and formatting fixes
|
2020-04-02 11:48:40 +02:00 |
Spyros Zoupanos
|
1ab97bbe00
|
Adding the full stats workflow to the dnet-hadoop hierarchy
|
2020-04-01 22:22:05 +03:00 |
Claudio Atzori
|
9c7092416a
|
dataset based provision WIP
|
2020-04-01 19:07:30 +02:00 |
miconis
|
bfa5bc74df
|
minor changes
|
2020-04-01 19:05:48 +02:00 |
Przemysław Jacewicz
|
80cf43b9c8
|
[dhp-actionmanager] promoting workflow added
|
2020-04-01 18:51:25 +02:00 |
Przemysław Jacewicz
|
5b459bcc47
|
[dhp-actionmanager] promoting spark job added
|
2020-04-01 18:49:08 +02:00 |
miconis
|
9802bcb9fe
|
dedup testing
|
2020-04-01 18:48:31 +02:00 |
Przemysław Jacewicz
|
e21bb89dbd
|
[dhp-actionmanager] partitioning spark job added
|
2020-04-01 18:41:29 +02:00 |
Przemysław Jacewicz
|
f9f7350bb9
|
[dhp-actionmanager] common package added with utility classes supporting hadoop and spark envs
|
2020-04-01 18:39:26 +02:00 |
Przemysław Jacewicz
|
ad70c23b2e
|
[dhp-actionmanager] pom updated
|
2020-04-01 18:36:00 +02:00 |
Przemysław Jacewicz
|
4e910a78d4
|
[dhp-workflows] spark 2 connection properties added
|
2020-04-01 18:29:26 +02:00 |
Claudio Atzori
|
1402eb1fe7
|
cleanup
|
2020-04-01 15:38:50 +02:00 |
Claudio Atzori
|
7061d07727
|
ActionSets migration serialize the output as plain text files instead of SequenceFiles
|
2020-04-01 14:58:22 +02:00 |
Claudio Atzori
|
adcdd2d05e
|
WIP: reimplementing the adjacency list construction process using spark Datasets
|
2020-04-01 14:56:57 +02:00 |
Sandro La Bruzzo
|
201d79021e
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-31 14:41:41 +02:00 |
Sandro La Bruzzo
|
cd7416ae4c
|
first implementation of incremental update of scholix index
|
2020-03-31 14:41:35 +02:00 |
przemek
|
9d1d18d4b9
|
Merge branch 'master' into przemyslawjacewicz_actionmanager_impl_prototype
|
2020-03-31 12:04:58 +02:00 |
Claudio Atzori
|
377e1ba840
|
[maven-release-plugin] prepare for next development iteration
|
2020-03-30 20:06:00 +02:00 |
Claudio Atzori
|
76d9315129
|
[maven-release-plugin] prepare release dhp-1.1.6
|
2020-03-30 20:05:56 +02:00 |
Claudio Atzori
|
ef429010ee
|
removed log file and job-override.properties
|
2020-03-30 20:00:58 +02:00 |
Claudio Atzori
|
0fbec69b82
|
use oozie prepare statement to cleanup working directories
|
2020-03-30 19:48:41 +02:00 |
Claudio Atzori
|
3af2b8d700
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-30 13:12:21 +02:00 |
Claudio Atzori
|
f3f9affd49
|
allow dynamic executors to build XML records
|
2020-03-30 13:12:11 +02:00 |
Claudio Atzori
|
2e2d4c4c68
|
adjusted path to template resource
|
2020-03-30 13:11:49 +02:00 |
Sandro La Bruzzo
|
62cc257e5c
|
fixed step1 workflow
|
2020-03-27 17:07:34 +01:00 |
Sandro La Bruzzo
|
1a7a866861
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-27 15:11:48 +01:00 |
Sandro La Bruzzo
|
7cef698f36
|
reformat code
|
2020-03-27 15:11:34 +01:00 |
Claudio Atzori
|
1767dfaa3f
|
method can be protected, it is meant to be used only in tests
|
2020-03-27 14:31:26 +01:00 |
Sandro La Bruzzo
|
a4b6a51168
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-27 13:48:56 +01:00 |
Sandro La Bruzzo
|
15d9106b3f
|
FIxed merge of dhp dedup
|
2020-03-27 13:48:44 +01:00 |
Claudio Atzori
|
e196fff212
|
adjusted path for source resource in unit test
|
2020-03-27 13:45:10 +01:00 |
Sandro La Bruzzo
|
8c9a56a0c8
|
refactored package name
|
2020-03-27 13:19:33 +01:00 |
Sandro La Bruzzo
|
2bd2d6f202
|
Merge branch 'master' of code-repo.d3science.org:D-Net/dnet-hadoop
|
2020-03-27 13:16:36 +01:00 |
Sandro La Bruzzo
|
a9935f80d4
|
refactor class name and workflow name for graph mapper, added javadoc
|
2020-03-27 13:16:24 +01:00 |
Michele Artini
|
ae03948eed
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-27 11:47:07 +01:00 |
Michele Artini
|
f6e86b44a6
|
tests
|
2020-03-27 11:46:37 +01:00 |
Michele Artini
|
408be3c632
|
test and fixed a problem with datacite namespaces
|
2020-03-27 11:44:50 +01:00 |
Claudio Atzori
|
673e744649
|
moved openaire specific implementations under dedicated package eu.dnetlib.dhp.oa
|
2020-03-27 10:42:17 +01:00 |
Claudio Atzori
|
098fabab3f
|
reorganizing content under dhp-workflows/dhp-graph-mapper
|
2020-03-26 19:44:19 +01:00 |
Claudio Atzori
|
77c4294924
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-26 18:26:52 +01:00 |
Claudio Atzori
|
43cbcda7ef
|
unit test for SparkGraphImporterJob
|
2020-03-26 18:26:40 +01:00 |
Sandro La Bruzzo
|
e04da6d66a
|
merged all oozie wf in one
|
2020-03-26 14:17:07 +01:00 |
Sandro La Bruzzo
|
e71e001b58
|
commented test that doesn't work
|
2020-03-26 14:15:21 +01:00 |
Sandro La Bruzzo
|
0cd022ad6a
|
merge with master
|
2020-03-26 14:08:29 +01:00 |
Claudio Atzori
|
abcd3f5bf5
|
added sample data for unit tests
|
2020-03-26 11:12:52 +01:00 |
Sandro La Bruzzo
|
d5f11e27be
|
renamed wf
|
2020-03-26 09:49:23 +01:00 |
Sandro La Bruzzo
|
9a37ad0127
|
renamed modules
|
2020-03-26 09:46:46 +01:00 |
Sandro La Bruzzo
|
a768226e52
|
updated generate scholix to generate json
|
2020-03-26 09:40:50 +01:00 |
Claudio Atzori
|
9dff4adbc3
|
dhp-graph-mapper workflow tests upgraded to junit5
|
2020-03-25 18:25:12 +01:00 |
Claudio Atzori
|
cd7dc3e1ae
|
dhp-dedup-openaire workflow tests upgraded to junit5
|
2020-03-25 18:04:23 +01:00 |
Claudio Atzori
|
c0e825e713
|
dhp-aggregation workflow tests upgraded to junit5
|
2020-03-25 17:59:45 +01:00 |
Michele Artini
|
ebe45003d9
|
fixed some junit packages
|
2020-03-25 16:45:03 +01:00 |
Michele Artini
|
d9bfdcd607
|
updated poms
|
2020-03-25 16:31:12 +01:00 |
Michele Artini
|
120e823cd1
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-25 16:00:10 +01:00 |
Claudio Atzori
|
71ae7dd272
|
renamed module dnet-dedup to dnet-dedup-openaire
|
2020-03-25 15:57:09 +01:00 |
Michele Artini
|
fd57722c69
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-25 15:56:49 +01:00 |
Claudio Atzori
|
f441f823dd
|
fixed path referencing a test resource file
|
2020-03-25 15:21:46 +01:00 |
Claudio Atzori
|
51d0c9bdd7
|
integrated changes from branch dedupTest
|
2020-03-25 15:15:41 +01:00 |
Claudio Atzori
|
36f8f2ea66
|
master set to 'yarn' in spark actions, removed path to rawSet from the dedup scan workflow
|
2020-03-25 14:16:06 +01:00 |
Michele Artini
|
2559299da4
|
tests
|
2020-03-25 12:25:00 +01:00 |
Claudio Atzori
|
2180cc4fe7
|
more fields included in result view definition
|
2020-03-25 11:21:46 +01:00 |
Claudio Atzori
|
efb0b7d660
|
master set to 'yarn' in spark actions
|
2020-03-25 11:15:35 +01:00 |
Michele Artini
|
0fda2c3a30
|
some tests on db records
|
2020-03-25 09:43:58 +01:00 |
miconis
|
02320de371
|
minor changes
|
2020-03-24 17:43:51 +01:00 |
miconis
|
8e8b5e8f30
|
roots wf merged in scan wf
|
2020-03-24 17:40:58 +01:00 |
Claudio Atzori
|
51ff68db66
|
Merge branch 'dedupTest' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dedupTest
|
2020-03-24 11:18:19 +01:00 |
Claudio Atzori
|
1e869e7bed
|
using method available from currently used library
|
2020-03-24 11:17:44 +01:00 |
miconis
|
f0d72b76a8
|
package structure fixed
|
2020-03-24 10:51:40 +01:00 |
Claudio Atzori
|
aaedbb1b8b
|
WIP: dedup workflow, stage 2
|
2020-03-24 09:59:28 +01:00 |
Michele Artini
|
e3760c7f39
|
fix a bug with organization countries
|
2020-03-24 08:43:56 +01:00 |
Claudio Atzori
|
8b0ba3d76a
|
posprocessing script correctly run as hive2 action
|
2020-03-23 17:40:39 +01:00 |
miconis
|
93e2291291
|
minor changes
|
2020-03-23 17:17:56 +01:00 |
miconis
|
f7890a90df
|
implementation of the mechanism that checks the existance of a mergerel file
|
2020-03-23 17:13:30 +01:00 |
miconis
|
c20e179f5a
|
structure of the workflows updated
|
2020-03-23 11:43:49 +01:00 |
Claudio Atzori
|
658d40ccbe
|
WIP trying to use hive2 actions
|
2020-03-23 11:14:54 +01:00 |
Claudio Atzori
|
ecb64e4998
|
Merge branch 'migration_wfs_regular_all_steps'
|
2020-03-23 08:57:01 +01:00 |
Michele Artini
|
15160032bd
|
fixed a bug setting some organization fields
|
2020-03-23 08:39:14 +01:00 |
Claudio Atzori
|
a4c52661a0
|
WIP: fixing dedup workflows
|
2020-03-20 19:17:24 +01:00 |
Claudio Atzori
|
6cb0a9bff0
|
dedup wf directory structure aligned with project commons
|
2020-03-20 16:48:14 +01:00 |
miconis
|
e16e644faf
|
implementation of the workflow for entity update and for relations update
|
2020-03-20 13:01:56 +01:00 |
przemek
|
638b78f96a
|
Merge remote-tracking branch 'origin/master' into przemyslawjacewicz_actionmanager_impl_prototype
|
2020-03-19 15:12:56 +01:00 |
miconis
|
4e82a24af2
|
minor changes and implementation of the create connected components action
|
2020-03-19 15:01:07 +01:00 |
Claudio Atzori
|
36236dd1c1
|
action migration workflow produces eu.dnetlib.dhp.schema.action.AtomicAction(s)
|
2020-03-19 14:00:38 +01:00 |
Claudio Atzori
|
a0ab15a64c
|
need to stick on using guava:11.0.2 as it is the version used by the hadoop components (oozie client for sure). The last version (28.2-jre) breaks the oozie workflow submission
|
2020-03-19 13:58:58 +01:00 |
Sandro La Bruzzo
|
0594b92a6d
|
implemented relation with dataset
|
2020-03-19 11:11:07 +01:00 |
miconis
|
679b5869e5
|
implementation of the lookup procedure to take dedup conf from the resource profiles
|
2020-03-18 17:41:56 +01:00 |
Claudio Atzori
|
abe8fb69a2
|
added global properties, moved postprocessing script inside the oozie_app directory
|
2020-03-18 15:43:54 +01:00 |
miconis
|
f32eae5ce9
|
implementation of the spark action for the simrel creation
|
2020-03-18 14:27:49 +01:00 |
Claudio Atzori
|
c7e0730720
|
compress the output produced by migration steps 1 and 2
|
2020-03-18 09:34:57 +01:00 |
Claudio Atzori
|
2f11e37602
|
fixed expansion of path variables
|
2020-03-17 19:41:07 +01:00 |
Claudio Atzori
|
2795b0b096
|
no need to mkdir a the all_entities file
|
2020-03-17 17:22:14 +01:00 |
Claudio Atzori
|
19746ad308
|
when reuseContent, reset ${workingPath}/all_entities
|
2020-03-17 17:17:06 +01:00 |
Claudio Atzori
|
2f0c85eeb3
|
updated parameters for regular_all_steps worfklow, introduced flag 'reuseContent'
|
2020-03-17 17:04:58 +01:00 |
Claudio Atzori
|
b8290b5851
|
updated parameters for regular_all_steps worfklow
|
2020-03-17 15:45:30 +01:00 |
Claudio Atzori
|
4706f24ec5
|
updated parameters for regular_all_steps worfklow
|
2020-03-17 15:23:54 +01:00 |
Claudio Atzori
|
aeb01fa353
|
reading from newline delimited json textfiles instead of sequence files
|
2020-03-17 11:57:24 +01:00 |
Claudio Atzori
|
af835f2f98
|
when migrating actionsets from DM cluster, populate the AtomicAction.targetValue when empty (dedup similarities)
|
2020-03-15 18:07:59 +01:00 |
Claudio Atzori
|
9c84e21b87
|
added workflow to migrate latest version of each actionset content from DM to OCEAN cluster, mapping the targetValues from the old protobuf data model to the dhp.OAF datamodel
|
2020-03-13 15:56:52 +01:00 |
Claudio Atzori
|
8fe7ae1482
|
xml formatting
|
2020-03-13 15:53:56 +01:00 |
Przemysław Jacewicz
|
d0c9b0cdd6
|
WIP promote job functions updated
|
2020-03-13 12:36:42 +01:00 |
Przemysław Jacewicz
|
8d9b3c5de2
|
WIP action payload mapping into OAF type moved, (local) graph table name enum created, tests fixed
|
2020-03-13 10:01:39 +01:00 |
Przemysław Jacewicz
|
5cc560c7e5
|
Removed unnecessary dependency on old OAF model
|
2020-03-13 09:57:46 +01:00 |
Sandro La Bruzzo
|
addaaa091f
|
migrate relation from RDD to Dataset
|
2020-03-13 09:13:20 +01:00 |
Przemysław Jacewicz
|
3f24593e51
|
WIP: promote job tests and test resources implementation snapshot
|
2020-03-11 17:06:29 +01:00 |
Przemysław Jacewicz
|
2e996d610f
|
WIP: promote job functions implementation snapshot
|
2020-03-11 17:02:57 +01:00 |
Przemysław Jacewicz
|
cc63cdc9e6
|
WIP: promote job implementation snapshot
|
2020-03-11 17:02:06 +01:00 |
Przemysław Jacewicz
|
69540f6f78
|
Serialization-safe supplier added
|
2020-03-11 16:59:05 +01:00 |
Przemysław Jacewicz
|
e6e214dab5
|
Oaf merge and get strategy added
|
2020-03-11 16:58:17 +01:00 |
Claudio Atzori
|
7b6f0c8756
|
reading graph dump as text files, encoded as newline-delimited JSON records, as indicated in the wiki
|
2020-03-10 17:19:17 +01:00 |
Claudio Atzori
|
60aedb1110
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-10 17:09:44 +01:00 |
Claudio Atzori
|
a3f184fd3f
|
added field websiteurl in related organizations
|
2020-03-10 17:08:58 +01:00 |
Claudio Atzori
|
0e95544495
|
fixed serialization for datasource subjects
|
2020-03-10 17:07:44 +01:00 |
Sandro La Bruzzo
|
7b28783fb4
|
updated unpaywall mapping
|
2020-03-08 17:00:19 +01:00 |
Michele Artini
|
b6efa9d6ab
|
Configuration of the SequenceFile Writer
|
2020-03-05 15:49:14 +01:00 |
Claudio Atzori
|
5e342a555c
|
no need to compute the inverse relClass, fixed text() in xpath expressions
|
2020-03-05 12:51:48 +01:00 |
Claudio Atzori
|
6ec04d4e02
|
specified column used to perform the join operation in the javadoc
|
2020-03-05 12:50:38 +01:00 |
Michele Artini
|
7a2a466161
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-04 14:50:59 +01:00 |
Michele Artini
|
755eade2fb
|
fix creation ids
|
2020-03-04 14:49:45 +01:00 |
Claudio Atzori
|
6379f32466
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-04 10:57:06 +01:00 |
Claudio Atzori
|
0233987603
|
introduced post processing step following the hive DB creation/population
|
2020-03-04 10:56:50 +01:00 |
Claudio Atzori
|
1e563bc15e
|
introduced distinct properties driving the resouce usage for the XML record creation and the indexing phase
|
2020-03-04 10:55:11 +01:00 |
Claudio Atzori
|
9af3e904be
|
close the SparkSession at the end
|
2020-03-04 10:53:31 +01:00 |
Michele Artini
|
e7167b996a
|
logs and closeable
|
2020-03-04 10:46:36 +01:00 |
Claudio Atzori
|
25ceec29ab
|
code formatting
|
2020-03-04 10:44:24 +01:00 |
Claudio Atzori
|
63c00c5e88
|
fixed typo
|
2020-03-04 10:43:44 +01:00 |
Claudio Atzori
|
9cf5ce2e66
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-02 17:03:10 +01:00 |
Claudio Atzori
|
bc7cfd5975
|
indexing workflow WIP: fixed projects fundingtree xml conversion, prioritized links between results and projects when limiting them to 100 in the join procedure
|
2020-03-02 17:03:07 +01:00 |
Michele Artini
|
4b29a121b0
|
migration using spark in step2
|
2020-03-02 16:12:14 +01:00 |
Michele Artini
|
5445a57102
|
migration using spark in step2
|
2020-03-02 16:11:59 +01:00 |
Sandro La Bruzzo
|
b32655e48e
|
changed code to save intermediate result
|
2020-02-27 10:18:46 +01:00 |
Claudio Atzori
|
60bc2b1a20
|
drop the hive DB before populating it from scratch
|
2020-02-27 10:10:55 +01:00 |
Sandro La Bruzzo
|
f09e065865
|
incremented number of repartition
|
2020-02-26 19:26:19 +01:00 |
Sandro La Bruzzo
|
071f5c3e52
|
fixed NPE
|
2020-02-26 15:42:20 +01:00 |
Sandro La Bruzzo
|
a1a6fc8315
|
fixed NPE
|
2020-02-26 15:42:13 +01:00 |
Sandro La Bruzzo
|
1edf02a3ce
|
added log
|
2020-02-26 15:25:03 +01:00 |
Sandro La Bruzzo
|
c3ecabd8e8
|
fixed NPE
|
2020-02-26 14:40:02 +01:00 |
Sandro La Bruzzo
|
5d0f46651b
|
fixed NPE
|
2020-02-26 14:31:34 +01:00 |
Sandro La Bruzzo
|
bc342bf73a
|
fixed wrong generation type in summary
|
2020-02-26 12:49:47 +01:00 |
Sandro La Bruzzo
|
3112e21858
|
fixed typo
|
2020-02-26 12:22:43 +01:00 |
Sandro La Bruzzo
|
119ae6eef5
|
fixed wrong loop in the workflow
|
2020-02-26 12:18:50 +01:00 |
Sandro La Bruzzo
|
7936583a3d
|
added generation of Scholix collection
|
2020-02-26 12:09:06 +01:00 |
Przemysław Jacewicz
|
02db368dc5
|
Merge branch 'master' into przemyslawjacewicz_actionmanager_impl_prototype
|
2020-02-26 11:50:20 +01:00 |
Sandro La Bruzzo
|
2ef3705b2c
|
Added Provision workflow
|
2020-02-26 10:51:35 +01:00 |
Michele Artini
|
689908b2e9
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-02-25 16:00:51 +01:00 |
Michele Artini
|
93665773ea
|
Fixed a problem with JavaRDD Union
|
2020-02-25 15:59:21 +01:00 |
Sandro La Bruzzo
|
b021b8a2e1
|
Added index wf
|
2020-02-24 10:15:55 +01:00 |
Claudio Atzori
|
6a73fd5da5
|
in order to reuse the same XmlRecordFactory across different tasks, the state of contexts must be one per record built
|
2020-02-21 09:17:19 +01:00 |
Michele Artini
|
d49cd2fdc6
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-02-20 11:21:54 +01:00 |
Claudio Atzori
|
5e5e32cb48
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-02-19 16:56:52 +01:00 |
Claudio Atzori
|
33185fd0b7
|
ISLookupClientFactory moved in dhp-common
|
2020-02-19 16:56:38 +01:00 |
Michele Artini
|
5d3739b5cf
|
migration of claims
|
2020-02-19 15:11:17 +01:00 |
Michele Artini
|
173f1df1e5
|
saved a query for openaire production database
|
2020-02-19 10:15:08 +01:00 |
Sandro La Bruzzo
|
9a2d74ac82
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-02-19 10:13:45 +01:00 |
Sandro La Bruzzo
|
e5d7cdf422
|
fixed sql query
|
2020-02-19 10:13:36 +01:00 |
Sandro La Bruzzo
|
2b8675462f
|
refactoring code
|
2020-02-19 10:07:08 +01:00 |
Claudio Atzori
|
ed76521d9b
|
removed stale test resources, will be re-added later on
|
2020-02-18 11:51:08 +01:00 |
Claudio Atzori
|
0f364605ff
|
removed stale tests, need to reimplemente them anyway
|
2020-02-18 11:48:19 +01:00 |
Przemysław Jacewicz
|
958f0693d6
|
WIP: logic for promoting action sets added
|
2020-02-17 18:19:19 +01:00 |
Przemysław Jacewicz
|
bea1a94346
|
Merge branch 'master' into przemyslawjacewicz_actionmanager_impl_prototype
# Conflicts:
# dhp-workflows/pom.xml
|
2020-02-17 15:07:23 +01:00 |
Claudio Atzori
|
6a288625e5
|
fixed workflow outgoing node
|
2020-02-17 15:04:33 +01:00 |
Claudio Atzori
|
1b18fd4d54
|
sync with master branch
|
2020-02-17 13:49:46 +01:00 |
Sandro La Bruzzo
|
4f04759738
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-02-17 12:31:58 +01:00 |
Sandro La Bruzzo
|
76ee85141a
|
added oozie job for DNET migration and implemented Spark job for extracting entities
|
2020-02-17 12:31:44 +01:00 |
Claudio Atzori
|
c460e2d281
|
Aggiornare 'dhp-workflows/docs/oozie-installer.markdown'
|
2020-02-17 11:54:48 +01:00 |
Michele Artini
|
176c5606bd
|
aligned with origin/master, aligned model and mapping
|
2020-02-17 10:40:53 +01:00 |
Claudio Atzori
|
56d1810a66
|
working procedure for records indexing using Spark, via lib com.lucidworks.spark:spark-solr
|
2020-02-14 12:28:52 +01:00 |
Claudio Atzori
|
1ee1baa8c0
|
Merge branch 'master' into provision_indexing
|
2020-02-13 18:17:07 +01:00 |
Claudio Atzori
|
a3d0b57b25
|
[maven-release-plugin] prepare for next development iteration
|
2020-02-13 18:11:33 +01:00 |
Claudio Atzori
|
6ed9a15bc8
|
[maven-release-plugin] prepare release dhp-1.1.5
|
2020-02-13 18:11:31 +01:00 |
Claudio Atzori
|
49e648f7c3
|
bumped version
|
2020-02-13 18:09:31 +01:00 |
Claudio Atzori
|
f9fae97e09
|
test json files aligned with the latest model changes
|
2020-02-13 18:05:59 +01:00 |
Claudio Atzori
|
1fee6e2b7e
|
implemented XML records construction and serialization, indexing WIP
|
2020-02-13 16:53:27 +01:00 |
Michele Artini
|
80cb52593f
|
bug fixing
|
2020-02-13 15:34:13 +01:00 |
Michele Artini
|
cdea0dae75
|
bug fixing
|
2020-02-12 16:34:00 +01:00 |
Michele Artini
|
69336195d3
|
simplifications
|
2020-02-12 11:12:38 +01:00 |
Michele Artini
|
06c2fd6df9
|
bug fixing
|
2020-02-11 15:29:50 +01:00 |
Michele Artini
|
5fc09b179c
|
bug fixing
|
2020-02-11 12:48:03 +01:00 |
Michele Artini
|
95740767e0
|
Ready for tests
|
2020-02-10 16:04:06 +01:00 |
Michele Artini
|
181e8498d4
|
...
|
2020-02-07 16:02:49 +01:00 |
Przemysław Jacewicz
|
86b60268bb
|
actionmanager implementation prototyping
|
2020-02-06 19:14:41 +01:00 |
Michele Artini
|
bb1533a07e
|
partial commit
|
2020-02-05 15:35:40 +01:00 |
Michele Artini
|
fbb0fc140b
|
partial implementation of migration
|
2020-02-04 15:25:47 +01:00 |
Claudio Atzori
|
7ba0f44d05
|
WIP
|
2020-01-30 18:21:07 +01:00 |
Claudio Atzori
|
49ef2f4eb1
|
removed input parameter specification, SparkXmlRecordBuilderJob doesn't need hive
|
2020-01-30 18:20:26 +01:00 |
Claudio Atzori
|
b5e1e2e5b2
|
reintegrated changes from fcbc4ccd70
|
2020-01-30 18:11:04 +01:00 |
Claudio Atzori
|
7bacd6812e
|
Merge branch 'provision_indexing' of https://code-repo.d4science.org/D-Net/dnet-hadoop into HEAD
Conflicts:
dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/GraphJoiner.java
dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/MappingUtils.java
dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/RelatedEntity.java
dhp-workflows/dhp-graph-provision/src/main/java/eu/dnetlib/dhp/graph/SparkXmlRecordBuilderJob.java
|
2020-01-30 17:59:46 +01:00 |
Claudio Atzori
|
b2691a3b0a
|
save adjacency list as JoinedEntity
|
2020-01-30 17:46:29 +01:00 |
Claudio Atzori
|
8c2aff99b0
|
joining entities using T x R x S, WIP: last representation based on LinkedEntity type
|
2020-01-29 15:40:33 +01:00 |
Sandro La Bruzzo
|
19a80e4638
|
implemented workfow for aggregation and generation of infospace graph
|
2020-01-24 09:58:55 +01:00 |
Claudio Atzori
|
fcbc4ccd70
|
a bit of docs doesn't hurt
|
2020-01-24 08:43:23 +01:00 |
Claudio Atzori
|
a55f5fecc6
|
joining entities using T x R x S method with groupByKey, WIP: making target objects (T) have lower memory footprint
|
2020-01-24 08:17:53 +01:00 |
Michele Artini
|
6bfe2dc96e
|
partial implementation
|
2020-01-22 16:00:23 +01:00 |
Claudio Atzori
|
799929c1e3
|
joining entities using T x R x S method with groupByKey
|
2020-01-21 16:35:44 +01:00 |
Michele Artini
|
f6eccdde33
|
partial implementation
|
2020-01-21 14:17:05 +01:00 |
Michele Artini
|
cd114f1c3b
|
partial update
|
2020-01-21 12:32:10 +01:00 |
Michele Artini
|
b35c59eb42
|
partial implementation of entities from db
|
2020-01-20 16:04:19 +01:00 |
Michele Artini
|
81f82b5d34
|
partial implementation of applications to migrate entities
|
2020-01-17 15:26:21 +01:00 |
Claudio Atzori
|
1cd6899480
|
merged from master
|
2020-01-17 14:25:57 +01:00 |
Claudio Atzori
|
97c239ee0d
|
WIP: trying to find a way to build the records for the index
|
2020-01-16 12:02:28 +02:00 |
miconis
|
4955be0197
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-01-14 15:03:44 +02:00 |
miconis
|
f61adfc2bb
|
minor changes
|
2020-01-14 15:03:27 +02:00 |
miconis
|
9bdcb02179
|
minor changes and update of the configuration for publications
|
2020-01-14 15:01:03 +02:00 |
Michele Artini
|
f7b9a7a9af
|
entity migration (partial implementation)
|
2020-01-10 15:55:23 +01:00 |
Michele Artini
|
7229fecbcf
|
fix warnings in poms
|
2019-12-20 13:41:08 +01:00 |
Sandro La Bruzzo
|
dd21db7036
|
fixed stuff
|
2019-12-18 16:28:22 +01:00 |
Claudio Atzori
|
7ba586d2e5
|
oozie workflow aimed to build the adjacency lists representation of the graph, needed to build the records to be indexed
|
2019-12-17 16:24:49 +01:00 |
Sandro La Bruzzo
|
76efcde4fd
|
using new branch decisionTreeDedup
|
2019-12-13 12:20:35 +01:00 |
Sandro La Bruzzo
|
b4392f9f43
|
implemented DedupRecord factory for missing entities
|
2019-12-13 09:40:02 +01:00 |
Sandro La Bruzzo
|
39367676d7
|
implemented DedupRecord factory with the merge of project
|
2019-12-12 15:18:48 +01:00 |
Sandro La Bruzzo
|
6b45e37e22
|
implemented DedupRecord factory with the merge of organizations
|
2019-12-11 16:57:37 +01:00 |
Sandro La Bruzzo
|
abd9034da0
|
implemented DedupRecord factory with the merge of publications
|
2019-12-11 15:43:24 +01:00 |
miconis
|
4b66b471a4
|
implementation of the sorting by trust mechanism and the merge of oaf entities
|
2019-12-10 14:57:16 +01:00 |
Sandro La Bruzzo
|
cc63706347
|
Implemented deduplication on spark
|
2019-12-06 13:38:00 +01:00 |
Sandro La Bruzzo
|
aad0cb40b7
|
Added schema Scholexplorer
|
2019-11-14 10:34:09 +01:00 |
Claudio Atzori
|
5711e75f67
|
use ${project.version} whenever possible
|
2019-11-08 17:41:51 +01:00 |
Claudio Atzori
|
245b4cbbb3
|
removed import limit
|
2019-11-08 17:41:01 +01:00 |
Claudio Atzori
|
7fe6835b47
|
[maven-release-plugin] prepare for next development iteration
|
2019-11-07 17:39:30 +01:00 |
Claudio Atzori
|
58918967d9
|
[maven-release-plugin] prepare release dhp-1.0.4
|
2019-11-07 17:39:27 +01:00 |
Claudio Atzori
|
5308f05a02
|
allow to speficy the target hive DB name in the infospace import workflow
|
2019-11-07 17:38:09 +01:00 |
Claudio Atzori
|
a52d5bde4f
|
simplified import procedure, maps the infospace as hive tables
|
2019-11-06 17:45:52 +01:00 |
Claudio Atzori
|
1e7a2ac41d
|
align parmeter names, graph import procedure WIP
|
2019-11-04 17:41:01 +01:00 |
Claudio Atzori
|
f39148dab8
|
[maven-release-plugin] prepare for next development iteration
|
2019-11-04 12:34:48 +01:00 |
Claudio Atzori
|
34b0e7b40a
|
[maven-release-plugin] prepare release dhp-1.0.3
|
2019-11-04 12:34:46 +01:00 |
Claudio Atzori
|
439ad80d81
|
conversion utilities from protobuffer model to DHP model moved in dnet-mapreduce-jobs. Removed also the relative protobuf dependencies
|
2019-11-04 12:33:23 +01:00 |
Claudio Atzori
|
32ed4ae8d6
|
conversion utilities from protobuffer model to DHP model moved in dnet-mapreduce-jobs. Removed also the relative protobuf dependencies
|
2019-11-04 12:28:56 +01:00 |
Sandro La Bruzzo
|
fd0ad82111
|
[maven-release-plugin] prepare for next development iteration
|
2019-10-31 12:08:51 +01:00 |
Sandro La Bruzzo
|
f224613b40
|
[maven-release-plugin] prepare release dhp-1.0.2
|
2019-10-31 12:08:49 +01:00 |
Sandro La Bruzzo
|
e13c30cc96
|
[maven-release-plugin] rollback the release of dhp-1.0.2
|
2019-10-31 12:07:04 +01:00 |
Sandro La Bruzzo
|
4da5239203
|
[maven-release-plugin] prepare release dhp-1.0.2
|
2019-10-31 12:06:14 +01:00 |
Sandro La Bruzzo
|
db8b346edd
|
[maven-release-plugin] rollback the release of 1.0.1
|
2019-10-31 11:49:05 +01:00 |
Sandro La Bruzzo
|
fc80052173
|
[maven-release-plugin] prepare for next development iteration
|
2019-10-31 11:47:42 +01:00 |
Sandro La Bruzzo
|
3150c7ce6d
|
[maven-release-plugin] prepare release 1.0.1
|
2019-10-31 11:47:40 +01:00 |
Sandro La Bruzzo
|
18ec8e8147
|
moved protoutils function to dhp-schemas
|
2019-10-31 11:31:37 +01:00 |
Sandro La Bruzzo
|
997e57d45b
|
Added entity filter to spark class
|
2019-10-30 12:19:03 +01:00 |
Sandro La Bruzzo
|
a336956708
|
added defautl property to job
|
2019-10-30 12:01:42 +01:00 |
Claudio Atzori
|
78b5b57e86
|
trying to make the spark action to be run as spark2
|
2019-10-29 18:56:34 +01:00 |
Claudio Atzori
|
c8bb81cd9a
|
align dependencies with IIS cluster
|
2019-10-29 18:10:20 +01:00 |
Sandro La Bruzzo
|
fe62ccd6dd
|
implemented oozie wf
|
2019-10-28 12:12:50 +01:00 |
Sandro La Bruzzo
|
9ee4e5a196
|
remove a bit of syntactic sugar on the object inheritance :(
|
2019-10-25 18:10:30 +02:00 |
Sandro La Bruzzo
|
c74335ebc7
|
resolved conflict
|
2019-10-25 14:34:50 +02:00 |
Sandro La Bruzzo
|
8c902c500a
|
minor fix
|
2019-10-25 14:33:54 +02:00 |
miconis
|
9fa5aebe9c
|
minor changes
|
2019-10-25 12:52:28 +02:00 |
miconis
|
551eda1600
|
dataset, orp and software mapping implemented. addition of test resources for results. implementation of tests to check the result of the mapping
|
2019-10-25 12:48:25 +02:00 |
Sandro La Bruzzo
|
eef14fade3
|
fixed conflict
|
2019-10-25 11:58:20 +02:00 |
Sandro La Bruzzo
|
0ea7e861ab
|
added organizations test
|
2019-10-25 11:56:28 +02:00 |
miconis
|
4908165e05
|
implementation of the createPublication method to map publications
|
2019-10-25 11:54:14 +02:00 |
miconis
|
df37bd6aaf
|
placeholders for setters in createpublication
|
2019-10-25 10:57:19 +02:00 |
Sandro La Bruzzo
|
c8d6d6bbd1
|
implemented organization mapping
|
2019-10-25 10:23:51 +02:00 |
miconis
|
b525b54130
|
starting implementing the createPublication class
|
2019-10-25 09:55:31 +02:00 |