Sandro La Bruzzo
|
920c0f19c3
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-04-29 13:13:16 +02:00 |
Sandro La Bruzzo
|
09f161f1f4
|
implemented unit test
|
2020-04-29 13:13:02 +02:00 |
miconis
|
e0d14fe4f8
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-29 13:02:53 +02:00 |
miconis
|
0352d3b0ba
|
entity dumps in dedup compressed
|
2020-04-29 13:02:34 +02:00 |
Michele Artini
|
c43b4c8962
|
formatting
|
2020-04-29 12:56:58 +02:00 |
Michele Artini
|
a5d7007005
|
Fix relations in migration
Fix pom.xml in dhp-stats-update
|
2020-04-29 12:05:41 +02:00 |
Claudio Atzori
|
3616d0f88d
|
Merge pull request 'Adding the stats workflow to the dnet-hadoop hierarchy' (#6) from spyros/dnet-hadoop:master into master
Integrating stats update workflow.
|
2020-04-29 10:35:02 +02:00 |
Claudio Atzori
|
964972d29a
|
added data provision workflow definition WIP
|
2020-04-29 09:25:50 +02:00 |
Enrico Ottonello
|
1edcd53581
|
added shell actions to download all 11 activities files from ORCID
|
2020-04-28 20:25:09 +02:00 |
miconis
|
62e467eb0c
|
assertion numbers updated to fit the new implementation of the pace-core
|
2020-04-28 11:46:23 +02:00 |
Claudio Atzori
|
6f5b899038
|
reformatted code according to the updated style descriptor
|
2020-04-28 11:23:29 +02:00 |
Claudio Atzori
|
ac25f2d8d1
|
integrated changes from master
|
2020-04-28 08:55:28 +02:00 |
Claudio Atzori
|
a0bdbacdae
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:52:31 +02:00 |
Claudio Atzori
|
7a3f8085f7
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:45:40 +02:00 |
Michele Artini
|
1260d03eba
|
skip empty projects
|
2020-04-27 13:51:13 +02:00 |
Enrico Ottonello
|
a1861b9eaa
|
workflow works in parallel on 2 activity files
|
2020-04-24 18:33:37 +02:00 |
Enrico Ottonello
|
941e94af06
|
added workflow for generating authors with dois data sequence file
|
2020-04-24 15:50:40 +02:00 |
Claudio Atzori
|
268462623a
|
refined definition of equals and hash methods for Oaf model classes, now based on entity identifier, while relations consider sourceid, targetid and relationship semantic; Factored out function to group Oaf objects in grouping operations; Raw graph creation procedure merges entities and relationships providing the same identity
|
2020-04-24 14:42:01 +02:00 |
Claudio Atzori
|
a3e480d1c9
|
implmented DispatchEntitiesApplication using spark2 datasets
|
2020-04-24 14:36:53 +02:00 |
Claudio Atzori
|
48157e0fc4
|
GraphHiveImporterJob moved in dedicate package
|
2020-04-24 14:32:28 +02:00 |
Claudio Atzori
|
278fc9d276
|
code formatting
|
2020-04-23 18:51:38 +02:00 |
miconis
|
5414236644
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-23 18:17:23 +02:00 |
miconis
|
8d258c85ff
|
spark dedup test fixed, sample for dataset and orp added, test implemented
|
2020-04-23 18:16:20 +02:00 |
Michele Artini
|
072eae3803
|
fixed a problem with missing contexts
|
2020-04-23 16:42:49 +02:00 |
Michele Artini
|
b164d96874
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-23 16:19:16 +02:00 |
Michele Artini
|
d920ce501e
|
fixed a problem with missing instances
|
2020-04-23 16:18:40 +02:00 |
Sandro La Bruzzo
|
fdc0523e4c
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-23 09:34:13 +02:00 |
Sandro La Bruzzo
|
4ba386d996
|
improved crossref mapping
|
2020-04-23 09:33:48 +02:00 |
Claudio Atzori
|
8851050814
|
replaced hive_db_name with hiveDbName
|
2020-04-23 08:36:40 +02:00 |
Claudio Atzori
|
91f81107b1
|
applying code formatting
|
2020-04-23 07:52:32 +02:00 |
Claudio Atzori
|
1e7583c5a6
|
filtered invisible records in data provision workflow
|
2020-04-23 07:51:34 +02:00 |
Claudio Atzori
|
9ddafd46ca
|
fixed dedup record id prefix, set the correct dataInfo in the DedupRecordFactory
|
2020-04-23 07:50:18 +02:00 |
Claudio Atzori
|
ade4cb97af
|
fixed parameters passed to the postprocessing action in the workflow mapping the graph as hive DB
|
2020-04-22 18:24:06 +02:00 |
Sandro La Bruzzo
|
bb6c9785b4
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-22 15:00:57 +02:00 |
Sandro La Bruzzo
|
157915988c
|
improved crossref mapping
|
2020-04-22 15:00:44 +02:00 |
Enrico Ottonello
|
5977f08e92
|
merged
|
2020-04-22 14:50:50 +02:00 |
Enrico Ottonello
|
7d759947ae
|
used vtd for parsing orcid xml record, set 4g heapspace
|
2020-04-22 14:41:19 +02:00 |
Claudio Atzori
|
e81960335c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-22 10:46:37 +02:00 |
Michele Artini
|
9e4d58f505
|
ResultType
|
2020-04-22 10:07:26 +02:00 |
Claudio Atzori
|
c891661822
|
small adjustments in the graph2hive workflow
|
2020-04-21 18:52:23 +02:00 |
Claudio Atzori
|
0b55795d4d
|
small adjustments in the provisioning workflow
|
2020-04-21 16:15:04 +02:00 |
Claudio Atzori
|
88fbb3a353
|
added sparkSqlWarehouseDir to the default extra spark options passed to each workflow
|
2020-04-21 16:13:43 +02:00 |
Claudio Atzori
|
cd320efa96
|
added extra spark options to graph to hive workflow
|
2020-04-21 16:12:20 +02:00 |
Claudio Atzori
|
91e72a6944
|
Dataset based implementation for SparkCreateDedupRecord phase, fixed datasource entity dump supplementing dedup unit tests
|
2020-04-21 12:06:08 +02:00 |
miconis
|
5c9ef08a8e
|
spark dedup test fixed
|
2020-04-21 10:19:04 +02:00 |
Sandro La Bruzzo
|
3624947a7f
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-21 08:34:24 +02:00 |
Claudio Atzori
|
d772d967aa
|
restored changes from master branch
|
2020-04-20 18:53:06 +02:00 |
Claudio Atzori
|
eb8a020859
|
fixed behaviour of DedupRecordFactory
|
2020-04-20 18:44:06 +02:00 |
Sandro La Bruzzo
|
039f9b7871
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-20 18:10:29 +02:00 |
Sandro La Bruzzo
|
e4b105cece
|
improved crossref mapping
|
2020-04-20 18:10:07 +02:00 |
Claudio Atzori
|
ede1af3d85
|
Merge branch 'master' into deduptesting
|
2020-04-20 16:52:14 +02:00 |
miconis
|
1102e32462
|
SparkDedupTest updated and organization dump fixed
|
2020-04-20 16:49:01 +02:00 |
Claudio Atzori
|
667d23c58b
|
finalising Actionset migration workflow
|
2020-04-20 16:45:21 +02:00 |
miconis
|
4da13e4570
|
Revert "Merge branch 'master' into deduptesting"
This reverts commit 772f75d167 , reversing
changes made to 5f45f2c77f .
|
2020-04-20 16:04:49 +02:00 |
Claudio Atzori
|
9147af7fed
|
actionsets migration workflow moved in dhp-workflows/dhp-actionmanager
|
2020-04-20 15:24:33 +02:00 |
miconis
|
772f75d167
|
Merge branch 'master' into deduptesting
|
2020-04-20 14:50:12 +02:00 |
Sandro La Bruzzo
|
5d46ec7d5f
|
fixed name of wrong package
|
2020-04-20 14:49:32 +02:00 |
Sandro La Bruzzo
|
82cc3b707d
|
fixed name of wrong package
|
2020-04-20 14:47:06 +02:00 |
Sandro La Bruzzo
|
b2c872cb4d
|
merged master
|
2020-04-20 14:04:40 +02:00 |
Sandro La Bruzzo
|
7029942e06
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-04-20 13:26:41 +02:00 |
Sandro La Bruzzo
|
0e45f4d450
|
continue mapping from crossref to OAF
|
2020-04-20 13:26:29 +02:00 |
Enrico Ottonello
|
a466648b4b
|
renamed output file
|
2020-04-20 12:32:03 +02:00 |
Claudio Atzori
|
d714bfb4d4
|
collectedfrom field moved in common parent class Oaf.java
|
2020-04-20 12:25:19 +02:00 |
Enrico Ottonello
|
4ae55e3891
|
added workflow parameters
|
2020-04-20 12:00:04 +02:00 |
Michele Artini
|
8ff7facfa3
|
fixed collectedFrom ID
|
2020-04-20 11:09:27 +02:00 |
Sandro La Bruzzo
|
eef60bb9f4
|
created structure of oozie wf for ORCID
|
2020-04-20 10:24:57 +02:00 |
Sandro La Bruzzo
|
4d0d9de07e
|
reorganized package and fixed test
|
2020-04-20 10:02:42 +02:00 |
Sandro La Bruzzo
|
618bc1fc72
|
first implementation of crossrefMapping
|
2020-04-20 09:53:34 +02:00 |
Michele Artini
|
25307965d2
|
add a default datainfo if missing
|
2020-04-20 09:43:27 +02:00 |
Michele Artini
|
d2058fdc47
|
tests
|
2020-04-20 09:31:14 +02:00 |
Enrico Ottonello
|
1d44a359ea
|
renamed package folder
|
2020-04-20 09:25:40 +02:00 |
Michele Artini
|
478a958f09
|
tests
|
2020-04-20 09:15:27 +02:00 |
Claudio Atzori
|
5f45f2c77f
|
Merge branch 'master' into deduptesting
|
2020-04-18 12:46:40 +02:00 |
Claudio Atzori
|
ad7a131b18
|
introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project
|
2020-04-18 12:42:58 +02:00 |
Claudio Atzori
|
a2938dd059
|
cleanup
|
2020-04-18 12:24:22 +02:00 |
Claudio Atzori
|
9374ff03ea
|
Merge branch 'master' into deduptesting
|
2020-04-18 12:06:58 +02:00 |
Claudio Atzori
|
71813795f6
|
various refactorings on the dnet-dedup-openaire workflow
|
2020-04-18 12:06:23 +02:00 |
Enrico Ottonello
|
7011d4203e
|
parser of orcid summaries from tar gz file on hdfs, that creates a sequence file with authors informations (oid, name, surname, credit name)
|
2020-04-17 18:52:39 +02:00 |
miconis
|
6450bb0daa
|
test for softwares dedup added. definition of orp, dataset and sw dedup configurations
|
2020-04-17 17:31:59 +02:00 |
Claudio Atzori
|
038ac7afd7
|
relation consistency workflow separated from dedup scan and creation of CCs
|
2020-04-17 13:12:44 +02:00 |
Claudio Atzori
|
c92bfeeaee
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-17 13:07:52 +02:00 |
Sandro La Bruzzo
|
a329ea5575
|
merged with master branch
|
2020-04-17 12:23:54 +02:00 |
Sandro La Bruzzo
|
01ea7721f3
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-17 12:12:25 +02:00 |
Sandro La Bruzzo
|
5e2fa996aa
|
fixed problem with conversion of long into string
|
2020-04-17 12:11:51 +02:00 |
miconis
|
418cf94642
|
implementation of the deletedbyinference test in propagating relations
|
2020-04-17 10:40:21 +02:00 |
Claudio Atzori
|
cb0952428e
|
Merge branch 'master' into deduptesting
|
2020-04-16 14:42:25 +02:00 |
Claudio Atzori
|
cc21bbfb1a
|
Merge branch 'deduptesting' of https://code-repo.d4science.org/D-Net/dnet-hadoop into deduptesting
|
2020-04-16 14:41:37 +02:00 |
Claudio Atzori
|
ec5dfc068d
|
added spark.sql.shuffle.partitions=3840 to dedup scan wf
|
2020-04-16 14:41:28 +02:00 |
Claudio Atzori
|
09f356b047
|
Merge pull request 'Closes #7: subdirs inside graph table dirs' (#8) from przemyslaw.jacewicz/dnet-hadoop:przemyslawjacewicz_7_distcp_configuration_fix into master
Run the code from this PR in isolation and it worked fine. Thanks!
|
2020-04-16 14:38:46 +02:00 |
Claudio Atzori
|
3437383112
|
Merge branch 'master' into deduptesting
|
2020-04-16 12:46:14 +02:00 |
miconis
|
0eccbc318b
|
Deduper class (utilities for dedup) cleaned. Useless methods removed
|
2020-04-16 12:36:37 +02:00 |
Claudio Atzori
|
76d23895e6
|
Merge branch 'deduptesting' of https://code-repo.d4science.org/D-Net/dnet-hadoop into deduptesting
|
2020-04-16 12:18:32 +02:00 |
miconis
|
6a089ec287
|
minor changes
|
2020-04-16 12:15:38 +02:00 |
Claudio Atzori
|
376efd67de
|
removed prepare statement in spark action
|
2020-04-16 12:14:16 +02:00 |
miconis
|
9b36458b6a
|
Merge branch 'deduptesting' of code-repo.d4science.org:D-Net/dnet-hadoop into deduptesting
|
2020-04-16 12:13:58 +02:00 |
miconis
|
cd4d9a148f
|
creating temporary directories in dedup test
|
2020-04-16 12:13:26 +02:00 |
Claudio Atzori
|
b39ff36c16
|
improving the wf definitions
|
2020-04-16 12:11:37 +02:00 |
Claudio Atzori
|
011b342bc9
|
trying to avoid OOM in SparkPropagateRelation
|
2020-04-16 11:13:51 +02:00 |
Claudio Atzori
|
069ef5eaed
|
trying to avoid OOM in SparkPropagateRelation
|
2020-04-15 21:23:21 +02:00 |
Claudio Atzori
|
8eedfefc98
|
try to introduce intermediate serialization on hdfs to avoid OOM
|
2020-04-15 18:35:35 +02:00 |