Miriam Baglioni
|
da9b012c15
|
fixed dewcription
|
2020-08-06 11:55:44 +02:00 |
Miriam Baglioni
|
6dbadcf181
|
the new schema for the dumped result
|
2020-08-06 11:05:56 +02:00 |
Miriam Baglioni
|
5b651abf82
|
merge branch with master
|
2020-08-04 10:14:07 +02:00 |
Miriam Baglioni
|
901ae37f7b
|
added step to workflow
|
2020-08-03 18:12:54 +02:00 |
Miriam Baglioni
|
e43aeb139a
|
added new property file and changed some parameter to old files
|
2020-08-03 18:07:28 +02:00 |
Miriam Baglioni
|
c892c7dfa7
|
changed to query for community map just once and save the result for remaining executions
|
2020-08-03 17:56:31 +02:00 |
Miriam Baglioni
|
6f1c40a933
|
-
|
2020-07-30 16:24:28 +02:00 |
Miriam Baglioni
|
2b66a93f9e
|
added property file that was missing
|
2020-07-30 16:24:17 +02:00 |
Michele Artini
|
bdece15ca0
|
blacklist of nsprefix
|
2020-07-30 16:13:38 +02:00 |
Sandro La Bruzzo
|
3010a362bc
|
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
|
2020-07-30 09:25:56 +02:00 |
Miriam Baglioni
|
b48934f6df
|
changed the workflow name
|
2020-07-29 17:43:43 +02:00 |
Miriam Baglioni
|
8ad8dac7d4
|
merge branch with fork master
|
2020-07-29 17:38:28 +02:00 |
Miriam Baglioni
|
40a8dafbdc
|
-
|
2020-07-29 17:30:44 +02:00 |
Miriam Baglioni
|
8d4327b292
|
input parameters and workflow definition for the dump of the whole graph
|
2020-07-29 17:00:34 +02:00 |
Miriam Baglioni
|
178c2729a7
|
changed the path to reach the java class to be executed
|
2020-07-29 12:29:51 +02:00 |
Miriam Baglioni
|
437ac12139
|
removed unused parameter
|
2020-07-29 12:28:16 +02:00 |
Miriam Baglioni
|
332258d199
|
split the classes related to the communities dump and to the whole graph dump
|
2020-07-24 17:21:48 +02:00 |
Claudio Atzori
|
56bbfdc65d
|
introduced parameter 'numParitions', driving the hive DB table data partitioning. Currently specified only for table 'project'
|
2020-07-23 08:54:10 +02:00 |
Miriam Baglioni
|
40bbe94f7c
|
merge with master fork
|
2020-07-20 18:10:03 +02:00 |
Miriam Baglioni
|
23160b4d29
|
realignment of the workflow classes with the changes in the structure of the module
|
2020-07-20 18:04:30 +02:00 |
Claudio Atzori
|
e0c4cf6f7b
|
added parameter to drive the graph merge strategy: priority (BETA|PROD)
|
2020-07-20 10:48:01 +02:00 |
Claudio Atzori
|
94ccdb4852
|
Merge branch 'master' into merge_graph
|
2020-07-20 10:14:55 +02:00 |
Sandro La Bruzzo
|
9116d75b3e
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-17 18:01:30 +02:00 |
Claudio Atzori
|
878f2b931c
|
Merge branch 'master' into merge_graph
|
2020-07-16 16:34:24 +02:00 |
Claudio Atzori
|
cc77446dc4
|
added dbSchema parameter to the raw_db workflow
|
2020-07-10 19:01:50 +02:00 |
Sandro La Bruzzo
|
c01efed79b
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-10 14:44:57 +02:00 |
Sandro La Bruzzo
|
a7d3977481
|
added generation of EBI Dataset
|
2020-07-10 14:44:50 +02:00 |
Claudio Atzori
|
610d377d57
|
first implementation of the BETA & PROD graphs merge procedure
|
2020-07-08 16:54:26 +02:00 |
Alessia Bardi
|
8f83b726fa
|
Dump json schema compliant to json schema Draft 7
|
2020-07-08 12:48:46 +02:00 |
Miriam Baglioni
|
7fe00cb4fb
|
-
|
2020-07-08 10:29:37 +02:00 |
Miriam Baglioni
|
b2782025f6
|
enabled the whole workflow to run. Added property to give priority to depenedency in the classpath - to solve conflicts
|
2020-07-07 18:10:47 +02:00 |
Miriam Baglioni
|
f5bb65c9ef
|
the json schema for the dump of the results
|
2020-07-07 17:34:40 +02:00 |
Miriam Baglioni
|
c19818a3f8
|
merge branch with fork master
|
2020-07-06 13:58:23 +02:00 |
Miriam Baglioni
|
94500a581b
|
merge branch with fork master
|
2020-07-02 14:25:39 +02:00 |
Claudio Atzori
|
ed1c7e5d75
|
fixed workflow for the import of the claims alone
|
2020-07-02 12:40:21 +02:00 |
Sandro La Bruzzo
|
1d420eedb4
|
added generation of EBI Dataset
|
2020-07-02 12:37:43 +02:00 |
Claudio Atzori
|
e4a29a4513
|
fixed workflow for the import of the claims alone
|
2020-07-02 12:36:33 +02:00 |
Miriam Baglioni
|
05a99cfb61
|
change the position of value and description elements in the workflow definition
|
2020-06-25 12:36:08 +02:00 |
Michele Artini
|
77d2a1b1c4
|
params to choose sql queries for beta or production
|
2020-06-25 09:28:13 +02:00 |
Miriam Baglioni
|
e8f914f8b3
|
-
|
2020-06-22 10:50:41 +02:00 |
Miriam Baglioni
|
669a509430
|
-
|
2020-06-19 17:39:46 +02:00 |
Miriam Baglioni
|
65bf312360
|
merge branch with fork master
|
2020-06-18 11:35:27 +02:00 |
Miriam Baglioni
|
e8b3e972f2
|
changed the input params and the workflow definition to tackle the Result as all result product produced
|
2020-06-18 11:25:05 +02:00 |
Sandro La Bruzzo
|
9bf67f5de1
|
resolved conflicts
|
2020-06-17 09:15:43 +02:00 |
Sandro La Bruzzo
|
1d4275acc4
|
implemented first version of exportation of Scholexplorer into ActionSet
|
2020-06-17 09:10:38 +02:00 |
Claudio Atzori
|
4ec262db53
|
included externalreference(s) in the result view on the Hive graph DB
|
2020-06-16 15:28:20 +02:00 |
Miriam Baglioni
|
9dd3ef22c5
|
merge branch with fork master
|
2020-06-15 11:23:26 +02:00 |
Miriam Baglioni
|
e43eedb5b0
|
added resources and workflow for dump of community products
|
2020-06-15 11:13:21 +02:00 |
Claudio Atzori
|
0d52816244
|
WIP: graph cleaner implementation
|
2020-06-13 13:06:04 +02:00 |
Claudio Atzori
|
953da4a427
|
Merge branch 'master' into graph_cleaning
|
2020-06-10 21:36:56 +02:00 |
Michele Artini
|
c08e66e01e
|
fixed a workflow parameter
|
2020-06-10 10:11:56 +02:00 |
Michele Artini
|
7177a32d75
|
import of invisible stores
|
2020-06-10 10:04:00 +02:00 |
Claudio Atzori
|
d9f33582c5
|
WIP: graph cleaner implementation
|
2020-06-09 17:20:40 +02:00 |
Miriam Baglioni
|
a089db18f1
|
workflow and parameters to exucute the dump
|
2020-06-09 15:39:38 +02:00 |
Miriam Baglioni
|
5121cbaf6a
|
new classes for external dump. Only classes functional to dump products
|
2020-06-09 15:37:46 +02:00 |
Claudio Atzori
|
b2349659cf
|
WIP: graph property fixing implementation
|
2020-06-05 18:37:38 +02:00 |
Claudio Atzori
|
54ca8ed6c3
|
uniformed param name (isLookupUrl), Vocab model classes defined as Serializable
|
2020-05-29 18:17:30 +02:00 |
Claudio Atzori
|
1577bd5b8b
|
added IsLookupUrl to the raw_db workflow parameters
|
2020-05-29 16:18:16 +02:00 |
Michele Artini
|
adb798faa5
|
import from db using is vocabularies
|
2020-05-29 12:03:51 +02:00 |
Michele Artini
|
3ceb2d2853
|
match terms with vocabularies
|
2020-05-27 11:34:13 +02:00 |
Michele Artini
|
c15d997925
|
xquery
|
2020-05-26 13:13:17 +02:00 |
Michele Artini
|
093f1aff03
|
result pids (new xpaths + IS vocabularies)
|
2020-05-26 13:06:55 +02:00 |
Michele Artini
|
e43d4d7778
|
added a coalesce in sql query
|
2020-05-21 11:08:07 +02:00 |
Michele Artini
|
b3bcbb3129
|
resolve name of organization countries
|
2020-05-21 08:41:32 +02:00 |
Claudio Atzori
|
18f46e47b9
|
added relations to the graph2hive import workflow
|
2020-05-15 09:34:48 +02:00 |
Claudio Atzori
|
9d028ffe1c
|
cleanup
|
2020-05-15 09:28:55 +02:00 |
Claudio Atzori
|
fd62359538
|
cleanup
|
2020-05-15 09:28:15 +02:00 |
Claudio Atzori
|
eb64335a54
|
parallel implementation for graph Hive importer
|
2020-05-15 09:05:26 +02:00 |
Claudio Atzori
|
4a8487165c
|
using long param names in wf definition
|
2020-05-04 19:19:29 +02:00 |
Claudio Atzori
|
a2fc37df5f
|
adjusted parameters
|
2020-05-04 19:18:59 +02:00 |
Michele Artini
|
a0a6109bbc
|
fixed a problem with journals
|
2020-04-30 11:03:46 +02:00 |
Claudio Atzori
|
48157e0fc4
|
GraphHiveImporterJob moved in dedicate package
|
2020-04-24 14:32:28 +02:00 |
Claudio Atzori
|
8851050814
|
replaced hive_db_name with hiveDbName
|
2020-04-23 08:36:40 +02:00 |
Claudio Atzori
|
ade4cb97af
|
fixed parameters passed to the postprocessing action in the workflow mapping the graph as hive DB
|
2020-04-22 18:24:06 +02:00 |
Claudio Atzori
|
c891661822
|
small adjustments in the graph2hive workflow
|
2020-04-21 18:52:23 +02:00 |
Claudio Atzori
|
cd320efa96
|
added extra spark options to graph to hive workflow
|
2020-04-21 16:12:20 +02:00 |
Claudio Atzori
|
ff30f99c65
|
using newline delimited json files for the raw graph materialization. Introduced contentPath parameter
|
2020-04-15 16:16:20 +02:00 |
Alessia Bardi
|
a68fae9bcb
|
now supporting openaire 4.0 compliance
|
2020-04-14 17:52:48 +02:00 |
Claudio Atzori
|
82e8341f50
|
reorganizing parameter names in the provision workflow
|
2020-04-14 15:54:41 +02:00 |
Claudio Atzori
|
6b5f9ca9cb
|
raw graph creation workflow moved under dhp-graph-mapper, claims integration is included
|
2020-04-10 17:53:07 +02:00 |
Claudio Atzori
|
47f3d9b757
|
unit test for GraphHiveImporterJob
|
2020-04-08 13:24:43 +02:00 |
Sandro La Bruzzo
|
62cc257e5c
|
fixed step1 workflow
|
2020-03-27 17:07:34 +01:00 |
Sandro La Bruzzo
|
15d9106b3f
|
FIxed merge of dhp dedup
|
2020-03-27 13:48:44 +01:00 |
Sandro La Bruzzo
|
a9935f80d4
|
refactor class name and workflow name for graph mapper, added javadoc
|
2020-03-27 13:16:24 +01:00 |
Claudio Atzori
|
673e744649
|
moved openaire specific implementations under dedicated package eu.dnetlib.dhp.oa
|
2020-03-27 10:42:17 +01:00 |
Claudio Atzori
|
098fabab3f
|
reorganizing content under dhp-workflows/dhp-graph-mapper
|
2020-03-26 19:44:19 +01:00 |
Claudio Atzori
|
77c4294924
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-26 18:26:52 +01:00 |
Claudio Atzori
|
43cbcda7ef
|
unit test for SparkGraphImporterJob
|
2020-03-26 18:26:40 +01:00 |
Sandro La Bruzzo
|
0cd022ad6a
|
merge with master
|
2020-03-26 14:08:29 +01:00 |
Claudio Atzori
|
2180cc4fe7
|
more fields included in result view definition
|
2020-03-25 11:21:46 +01:00 |
Claudio Atzori
|
8b0ba3d76a
|
posprocessing script correctly run as hive2 action
|
2020-03-23 17:40:39 +01:00 |
Claudio Atzori
|
658d40ccbe
|
WIP trying to use hive2 actions
|
2020-03-23 11:14:54 +01:00 |
Claudio Atzori
|
abe8fb69a2
|
added global properties, moved postprocessing script inside the oozie_app directory
|
2020-03-18 15:43:54 +01:00 |
Claudio Atzori
|
8fe7ae1482
|
xml formatting
|
2020-03-13 15:53:56 +01:00 |
Sandro La Bruzzo
|
addaaa091f
|
migrate relation from RDD to Dataset
|
2020-03-13 09:13:20 +01:00 |
Claudio Atzori
|
0233987603
|
introduced post processing step following the hive DB creation/population
|
2020-03-04 10:56:50 +01:00 |
Sandro La Bruzzo
|
2b8675462f
|
refactoring code
|
2020-02-19 10:07:08 +01:00 |
Sandro La Bruzzo
|
19a80e4638
|
implemented workfow for aggregation and generation of infospace graph
|
2020-01-24 09:58:55 +01:00 |
Sandro La Bruzzo
|
abd9034da0
|
implemented DedupRecord factory with the merge of publications
|
2019-12-11 15:43:24 +01:00 |
miconis
|
4b66b471a4
|
implementation of the sorting by trust mechanism and the merge of oaf entities
|
2019-12-10 14:57:16 +01:00 |