Claudio Atzori
|
ab37953332
|
added global properties in wf definitions to avoid repeating name-node and job-tracker in the (many) distcp actions; reintroduced output directory removal at the beginning of each spark action
|
2020-05-14 10:25:41 +02:00 |
Claudio Atzori
|
5ecacad70a
|
fixed default resource typing in Oaf/Odf mapping
|
2020-05-13 17:01:11 +02:00 |
Miriam Baglioni
|
f5d785e096
|
used the DbClient moved in dhp-common
|
2020-05-11 13:59:42 +02:00 |
Miriam Baglioni
|
2abb84877d
|
Merge branch 'master' into blacklist
|
2020-05-11 10:37:49 +02:00 |
Miriam Baglioni
|
5e3548add6
|
-
|
2020-05-11 10:33:08 +02:00 |
Miriam Baglioni
|
871e079b45
|
merged with master
|
2020-05-11 10:20:00 +02:00 |
Miriam Baglioni
|
32301451ec
|
merge upstream
|
2020-05-11 09:42:23 +02:00 |
Miriam Baglioni
|
4c94231cad
|
merge with master fork
|
2020-05-08 12:25:57 +02:00 |
Claudio Atzori
|
62ea19f1d3
|
introduced mapping for ExternalReferences, made urls defined within an instance unique
|
2020-05-08 09:43:26 +02:00 |
Miriam Baglioni
|
207b899d6d
|
merged with upstream
|
2020-05-07 11:43:53 +02:00 |
Miriam Baglioni
|
5efae3acb9
|
new workflow for job3
|
2020-05-07 11:38:10 +02:00 |
Claudio Atzori
|
17860d3ab6
|
general changes in the RAW graph mapping: missing collectedfrom/hostedby causes records to be skipped; factored out most of the constants in ModelConstants class (dhp-schemas)
|
2020-05-06 13:20:02 +02:00 |
Michele Artini
|
8f30a09d84
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-05 17:12:22 +02:00 |
Michele Artini
|
ccc609f909
|
new module for the production of broker events
|
2020-05-05 17:09:00 +02:00 |
Claudio Atzori
|
4a8487165c
|
using long param names in wf definition
|
2020-05-04 19:19:29 +02:00 |
Claudio Atzori
|
a2fc37df5f
|
adjusted parameters
|
2020-05-04 19:18:59 +02:00 |
Claudio Atzori
|
f1b7e14036
|
code formatting
|
2020-05-04 19:18:34 +02:00 |
Miriam Baglioni
|
31ea05297d
|
moved the DbClient to common and added needed dependency to pom
|
2020-05-04 12:22:28 +02:00 |
Miriam Baglioni
|
4b0bd91012
|
-
|
2020-04-30 12:45:28 +02:00 |
Miriam Baglioni
|
3abb76ff7a
|
merge with upstream
|
2020-04-30 11:15:54 +02:00 |
Michele Artini
|
eb9bd42970
|
fixed a problem with journals
|
2020-04-30 11:06:05 +02:00 |
Miriam Baglioni
|
638a3c465b
|
-
|
2020-04-30 11:05:17 +02:00 |
Michele Artini
|
a0a6109bbc
|
fixed a problem with journals
|
2020-04-30 11:03:46 +02:00 |
Claudio Atzori
|
439c6255a2
|
cleanup
|
2020-04-29 19:09:07 +02:00 |
Claudio Atzori
|
77ac995770
|
cleaned up poms, added descriptions
|
2020-04-29 18:44:17 +02:00 |
Miriam Baglioni
|
3cffee74b9
|
merge with upstream
|
2020-04-29 18:25:29 +02:00 |
Michele Artini
|
c43b4c8962
|
formatting
|
2020-04-29 12:56:58 +02:00 |
Michele Artini
|
a5d7007005
|
Fix relations in migration
Fix pom.xml in dhp-stats-update
|
2020-04-29 12:05:41 +02:00 |
Miriam Baglioni
|
f7695e833c
|
resolved conflicts
|
2020-04-29 11:41:31 +02:00 |
Claudio Atzori
|
6f5b899038
|
reformatted code according to the updated style descriptor
|
2020-04-28 11:23:29 +02:00 |
Claudio Atzori
|
ac25f2d8d1
|
integrated changes from master
|
2020-04-28 08:55:28 +02:00 |
Miriam Baglioni
|
2980e50edf
|
merge upstream
|
2020-04-27 15:06:48 +02:00 |
Claudio Atzori
|
a0bdbacdae
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:52:31 +02:00 |
Claudio Atzori
|
7a3f8085f7
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:45:40 +02:00 |
Michele Artini
|
1260d03eba
|
skip empty projects
|
2020-04-27 13:51:13 +02:00 |
Miriam Baglioni
|
c093d764a3
|
-
|
2020-04-27 11:12:38 +02:00 |
Claudio Atzori
|
268462623a
|
refined definition of equals and hash methods for Oaf model classes, now based on entity identifier, while relations consider sourceid, targetid and relationship semantic; Factored out function to group Oaf objects in grouping operations; Raw graph creation procedure merges entities and relationships providing the same identity
|
2020-04-24 14:42:01 +02:00 |
Claudio Atzori
|
a3e480d1c9
|
implmented DispatchEntitiesApplication using spark2 datasets
|
2020-04-24 14:36:53 +02:00 |
Claudio Atzori
|
48157e0fc4
|
GraphHiveImporterJob moved in dedicate package
|
2020-04-24 14:32:28 +02:00 |
Michele Artini
|
072eae3803
|
fixed a problem with missing contexts
|
2020-04-23 16:42:49 +02:00 |
Michele Artini
|
b164d96874
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-23 16:19:16 +02:00 |
Michele Artini
|
d920ce501e
|
fixed a problem with missing instances
|
2020-04-23 16:18:40 +02:00 |
Claudio Atzori
|
8851050814
|
replaced hive_db_name with hiveDbName
|
2020-04-23 08:36:40 +02:00 |
Claudio Atzori
|
91f81107b1
|
applying code formatting
|
2020-04-23 07:52:32 +02:00 |
Claudio Atzori
|
ade4cb97af
|
fixed parameters passed to the postprocessing action in the workflow mapping the graph as hive DB
|
2020-04-22 18:24:06 +02:00 |
Claudio Atzori
|
e81960335c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-22 10:46:37 +02:00 |
Michele Artini
|
9e4d58f505
|
ResultType
|
2020-04-22 10:07:26 +02:00 |
Claudio Atzori
|
c891661822
|
small adjustments in the graph2hive workflow
|
2020-04-21 18:52:23 +02:00 |
Claudio Atzori
|
cd320efa96
|
added extra spark options to graph to hive workflow
|
2020-04-21 16:12:20 +02:00 |
Claudio Atzori
|
d772d967aa
|
restored changes from master branch
|
2020-04-20 18:53:06 +02:00 |
miconis
|
4da13e4570
|
Revert "Merge branch 'master' into deduptesting"
This reverts commit 772f75d167 , reversing
changes made to 5f45f2c77f .
|
2020-04-20 16:04:49 +02:00 |
Claudio Atzori
|
d714bfb4d4
|
collectedfrom field moved in common parent class Oaf.java
|
2020-04-20 12:25:19 +02:00 |
Michele Artini
|
8ff7facfa3
|
fixed collectedFrom ID
|
2020-04-20 11:09:27 +02:00 |
Michele Artini
|
25307965d2
|
add a default datainfo if missing
|
2020-04-20 09:43:27 +02:00 |
Michele Artini
|
d2058fdc47
|
tests
|
2020-04-20 09:31:14 +02:00 |
Michele Artini
|
478a958f09
|
tests
|
2020-04-20 09:15:27 +02:00 |
Claudio Atzori
|
ad7a131b18
|
introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project
|
2020-04-18 12:42:58 +02:00 |
Claudio Atzori
|
ff30f99c65
|
using newline delimited json files for the raw graph materialization. Introduced contentPath parameter
|
2020-04-15 16:16:20 +02:00 |
Alessia Bardi
|
550a9f82ed
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-14 17:53:01 +02:00 |
Alessia Bardi
|
a68fae9bcb
|
now supporting openaire 4.0 compliance
|
2020-04-14 17:52:48 +02:00 |
Sandro La Bruzzo
|
c36239e693
|
fixed incremental indexing
|
2020-04-14 17:47:36 +02:00 |
Claudio Atzori
|
82e8341f50
|
reorganizing parameter names in the provision workflow
|
2020-04-14 15:54:41 +02:00 |
Claudio Atzori
|
6b5f9ca9cb
|
raw graph creation workflow moved under dhp-graph-mapper, claims integration is included
|
2020-04-10 17:53:07 +02:00 |
Claudio Atzori
|
47f3d9b757
|
unit test for GraphHiveImporterJob
|
2020-04-08 13:24:43 +02:00 |
Claudio Atzori
|
d74e128aa6
|
Utility classes moved in dhp-common and dhp-schemas
|
2020-04-07 11:56:22 +02:00 |
Sandro La Bruzzo
|
62cc257e5c
|
fixed step1 workflow
|
2020-03-27 17:07:34 +01:00 |
Claudio Atzori
|
1767dfaa3f
|
method can be protected, it is meant to be used only in tests
|
2020-03-27 14:31:26 +01:00 |
Sandro La Bruzzo
|
a4b6a51168
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-27 13:48:56 +01:00 |
Sandro La Bruzzo
|
15d9106b3f
|
FIxed merge of dhp dedup
|
2020-03-27 13:48:44 +01:00 |
Claudio Atzori
|
e196fff212
|
adjusted path for source resource in unit test
|
2020-03-27 13:45:10 +01:00 |
Sandro La Bruzzo
|
8c9a56a0c8
|
refactored package name
|
2020-03-27 13:19:33 +01:00 |
Sandro La Bruzzo
|
a9935f80d4
|
refactor class name and workflow name for graph mapper, added javadoc
|
2020-03-27 13:16:24 +01:00 |
Claudio Atzori
|
673e744649
|
moved openaire specific implementations under dedicated package eu.dnetlib.dhp.oa
|
2020-03-27 10:42:17 +01:00 |
Claudio Atzori
|
098fabab3f
|
reorganizing content under dhp-workflows/dhp-graph-mapper
|
2020-03-26 19:44:19 +01:00 |
Claudio Atzori
|
77c4294924
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-26 18:26:52 +01:00 |
Claudio Atzori
|
43cbcda7ef
|
unit test for SparkGraphImporterJob
|
2020-03-26 18:26:40 +01:00 |
Sandro La Bruzzo
|
0cd022ad6a
|
merge with master
|
2020-03-26 14:08:29 +01:00 |
Claudio Atzori
|
abcd3f5bf5
|
added sample data for unit tests
|
2020-03-26 11:12:52 +01:00 |
Claudio Atzori
|
9dff4adbc3
|
dhp-graph-mapper workflow tests upgraded to junit5
|
2020-03-25 18:25:12 +01:00 |
Michele Artini
|
ebe45003d9
|
fixed some junit packages
|
2020-03-25 16:45:03 +01:00 |
Claudio Atzori
|
2180cc4fe7
|
more fields included in result view definition
|
2020-03-25 11:21:46 +01:00 |
Claudio Atzori
|
8b0ba3d76a
|
posprocessing script correctly run as hive2 action
|
2020-03-23 17:40:39 +01:00 |
Claudio Atzori
|
658d40ccbe
|
WIP trying to use hive2 actions
|
2020-03-23 11:14:54 +01:00 |
Sandro La Bruzzo
|
0594b92a6d
|
implemented relation with dataset
|
2020-03-19 11:11:07 +01:00 |
Claudio Atzori
|
abe8fb69a2
|
added global properties, moved postprocessing script inside the oozie_app directory
|
2020-03-18 15:43:54 +01:00 |
Claudio Atzori
|
8fe7ae1482
|
xml formatting
|
2020-03-13 15:53:56 +01:00 |
Sandro La Bruzzo
|
addaaa091f
|
migrate relation from RDD to Dataset
|
2020-03-13 09:13:20 +01:00 |
Claudio Atzori
|
7b6f0c8756
|
reading graph dump as text files, encoded as newline-delimited JSON records, as indicated in the wiki
|
2020-03-10 17:19:17 +01:00 |
Claudio Atzori
|
0233987603
|
introduced post processing step following the hive DB creation/population
|
2020-03-04 10:56:50 +01:00 |
Claudio Atzori
|
9af3e904be
|
close the SparkSession at the end
|
2020-03-04 10:53:31 +01:00 |
Claudio Atzori
|
25ceec29ab
|
code formatting
|
2020-03-04 10:44:24 +01:00 |
Claudio Atzori
|
60bc2b1a20
|
drop the hive DB before populating it from scratch
|
2020-02-27 10:10:55 +01:00 |
Sandro La Bruzzo
|
2b8675462f
|
refactoring code
|
2020-02-19 10:07:08 +01:00 |
Claudio Atzori
|
1b18fd4d54
|
sync with master branch
|
2020-02-17 13:49:46 +01:00 |
Sandro La Bruzzo
|
76ee85141a
|
added oozie job for DNET migration and implemented Spark job for extracting entities
|
2020-02-17 12:31:44 +01:00 |
Claudio Atzori
|
1fee6e2b7e
|
implemented XML records construction and serialization, indexing WIP
|
2020-02-13 16:53:27 +01:00 |
Sandro La Bruzzo
|
19a80e4638
|
implemented workfow for aggregation and generation of infospace graph
|
2020-01-24 09:58:55 +01:00 |
Michele Artini
|
b35c59eb42
|
partial implementation of entities from db
|
2020-01-20 16:04:19 +01:00 |
Sandro La Bruzzo
|
abd9034da0
|
implemented DedupRecord factory with the merge of publications
|
2019-12-11 15:43:24 +01:00 |
miconis
|
4b66b471a4
|
implementation of the sorting by trust mechanism and the merge of oaf entities
|
2019-12-10 14:57:16 +01:00 |
Sandro La Bruzzo
|
aad0cb40b7
|
Added schema Scholexplorer
|
2019-11-14 10:34:09 +01:00 |
Claudio Atzori
|
245b4cbbb3
|
removed import limit
|
2019-11-08 17:41:01 +01:00 |
Claudio Atzori
|
5308f05a02
|
allow to speficy the target hive DB name in the infospace import workflow
|
2019-11-07 17:38:09 +01:00 |
Claudio Atzori
|
a52d5bde4f
|
simplified import procedure, maps the infospace as hive tables
|
2019-11-06 17:45:52 +01:00 |
Claudio Atzori
|
1e7a2ac41d
|
align parmeter names, graph import procedure WIP
|
2019-11-04 17:41:01 +01:00 |
Claudio Atzori
|
439ad80d81
|
conversion utilities from protobuffer model to DHP model moved in dnet-mapreduce-jobs. Removed also the relative protobuf dependencies
|
2019-11-04 12:33:23 +01:00 |
Claudio Atzori
|
32ed4ae8d6
|
conversion utilities from protobuffer model to DHP model moved in dnet-mapreduce-jobs. Removed also the relative protobuf dependencies
|
2019-11-04 12:28:56 +01:00 |
Sandro La Bruzzo
|
18ec8e8147
|
moved protoutils function to dhp-schemas
|
2019-10-31 11:31:37 +01:00 |
Sandro La Bruzzo
|
997e57d45b
|
Added entity filter to spark class
|
2019-10-30 12:19:03 +01:00 |
Sandro La Bruzzo
|
a336956708
|
added defautl property to job
|
2019-10-30 12:01:42 +01:00 |
Claudio Atzori
|
78b5b57e86
|
trying to make the spark action to be run as spark2
|
2019-10-29 18:56:34 +01:00 |
Claudio Atzori
|
c8bb81cd9a
|
align dependencies with IIS cluster
|
2019-10-29 18:10:20 +01:00 |
Sandro La Bruzzo
|
fe62ccd6dd
|
implemented oozie wf
|
2019-10-28 12:12:50 +01:00 |
Sandro La Bruzzo
|
9ee4e5a196
|
remove a bit of syntactic sugar on the object inheritance :(
|
2019-10-25 18:10:30 +02:00 |
Sandro La Bruzzo
|
c74335ebc7
|
resolved conflict
|
2019-10-25 14:34:50 +02:00 |
Sandro La Bruzzo
|
8c902c500a
|
minor fix
|
2019-10-25 14:33:54 +02:00 |
miconis
|
9fa5aebe9c
|
minor changes
|
2019-10-25 12:52:28 +02:00 |
miconis
|
551eda1600
|
dataset, orp and software mapping implemented. addition of test resources for results. implementation of tests to check the result of the mapping
|
2019-10-25 12:48:25 +02:00 |
Sandro La Bruzzo
|
eef14fade3
|
fixed conflict
|
2019-10-25 11:58:20 +02:00 |
Sandro La Bruzzo
|
0ea7e861ab
|
added organizations test
|
2019-10-25 11:56:28 +02:00 |
miconis
|
4908165e05
|
implementation of the createPublication method to map publications
|
2019-10-25 11:54:14 +02:00 |
miconis
|
df37bd6aaf
|
placeholders for setters in createpublication
|
2019-10-25 10:57:19 +02:00 |
Sandro La Bruzzo
|
c8d6d6bbd1
|
implemented organization mapping
|
2019-10-25 10:23:51 +02:00 |
miconis
|
b525b54130
|
starting implementing the createPublication class
|
2019-10-25 09:55:31 +02:00 |
Claudio Atzori
|
4b331790e7
|
resolved conflicts
|
2019-10-25 09:45:12 +02:00 |
Claudio Atzori
|
c929c1dfac
|
more proto 2 graph model mappings
|
2019-10-25 09:25:36 +02:00 |
Sandro La Bruzzo
|
09ffda03a2
|
removed circular dependencies
|
2019-10-25 09:24:18 +02:00 |
Sandro La Bruzzo
|
a10d071cf4
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2019-10-24 17:55:44 +02:00 |
Sandro La Bruzzo
|
3a8bb11695
|
mapped first part
|
2019-10-24 17:55:40 +02:00 |
Claudio Atzori
|
d46371ceab
|
Merge branch 'master' of https://code-repo.d2science.org/D-Net/dnet-hadoop
|
2019-10-24 17:43:55 +02:00 |
Claudio Atzori
|
0d88f9a6a4
|
added mapping for projects
|
2019-10-24 17:43:42 +02:00 |
Sandro La Bruzzo
|
2dd9572f41
|
added Mapping of OriginalDescription
|
2019-10-24 17:36:44 +02:00 |
miconis
|
351d850ad3
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2019-10-24 17:29:07 +02:00 |
miconis
|
b66a7e3030
|
publication test added
|
2019-10-24 17:29:01 +02:00 |
Sandro La Bruzzo
|
6c32d418ac
|
added conversion of ExtraInfo
|
2019-10-24 17:26:55 +02:00 |
Claudio Atzori
|
5f339a2c24
|
added mappings for basic types
|
2019-10-24 17:21:45 +02:00 |
Sandro La Bruzzo
|
9d04111391
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2019-10-24 17:05:52 +02:00 |
Sandro La Bruzzo
|
0902bac7dd
|
fixed conflict
|
2019-10-24 17:05:42 +02:00 |
Claudio Atzori
|
d8bfaa3687
|
added mapping for relations
|
2019-10-24 17:04:13 +02:00 |
Sandro La Bruzzo
|
d2965636e0
|
created test for convert json into new OAF data model
|
2019-10-24 17:02:35 +02:00 |
Claudio Atzori
|
79c4f1bbd8
|
Protobuf to internal graph model, early steps
|
2019-10-24 16:56:13 +02:00 |
Claudio Atzori
|
d38aeb8c6e
|
DataInfo.provenanceaction not repeatable, fluent setters
|
2019-10-24 16:55:38 +02:00 |
Sandro La Bruzzo
|
5744a64478
|
added module dhp=graph-mapper
|
2019-10-24 16:00:28 +02:00 |