Michele Artini
|
072eae3803
|
fixed a problem with missing contexts
|
2020-04-23 16:42:49 +02:00 |
Michele Artini
|
b164d96874
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-23 16:19:16 +02:00 |
Michele Artini
|
d920ce501e
|
fixed a problem with missing instances
|
2020-04-23 16:18:40 +02:00 |
Miriam Baglioni
|
0e447add66
|
removed unuseful classes
|
2020-04-23 12:59:43 +02:00 |
Miriam Baglioni
|
edb00db86a
|
refactoring
|
2020-04-23 12:57:35 +02:00 |
Miriam Baglioni
|
44fab140de
|
-
|
2020-04-23 12:42:07 +02:00 |
Miriam Baglioni
|
769aa8178a
|
refactoring
|
2020-04-23 12:40:44 +02:00 |
Miriam Baglioni
|
d8dc31d4af
|
refactoring
|
2020-04-23 12:35:49 +02:00 |
Miriam Baglioni
|
8c5dac5cc3
|
removed unuseful classes
|
2020-04-23 12:30:58 +02:00 |
Miriam Baglioni
|
15656684b9
|
added proeprties for the preparation step and actual propagation. Added the new parametrized workflow
|
2020-04-23 12:13:34 +02:00 |
Miriam Baglioni
|
6f35f5ca42
|
added the steps of reset output dir and copy information not changed by the propagation step
|
2020-04-23 12:12:07 +02:00 |
Miriam Baglioni
|
19cd5b85c0
|
changed the classname to execute
|
2020-04-23 12:07:41 +02:00 |
Miriam Baglioni
|
fa2ff5c6f5
|
refactoring
|
2020-04-23 11:58:26 +02:00 |
Miriam Baglioni
|
540f70298b
|
added missing property
|
2020-04-23 11:51:48 +02:00 |
Miriam Baglioni
|
e431fe4f5b
|
added the implements Serializable to each class
|
2020-04-23 11:48:47 +02:00 |
Miriam Baglioni
|
24fa81d7e8
|
implementation parametrized for result type
|
2020-04-23 11:44:19 +02:00 |
Miriam Baglioni
|
ab2a24cc2b
|
changed the dependency to use reflections to find annotated classes
|
2020-04-23 11:08:47 +02:00 |
Miriam Baglioni
|
5153d88bd3
|
defiition of workflow and properties for bulktagging
|
2020-04-23 11:04:53 +02:00 |
Miriam Baglioni
|
3b2e4ab670
|
test for bulktag
|
2020-04-23 10:00:10 +02:00 |
Sandro La Bruzzo
|
fdc0523e4c
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-23 09:34:13 +02:00 |
Sandro La Bruzzo
|
4ba386d996
|
improved crossref mapping
|
2020-04-23 09:33:48 +02:00 |
Claudio Atzori
|
8851050814
|
replaced hive_db_name with hiveDbName
|
2020-04-23 08:36:40 +02:00 |
Claudio Atzori
|
91f81107b1
|
applying code formatting
|
2020-04-23 07:52:32 +02:00 |
Claudio Atzori
|
1e7583c5a6
|
filtered invisible records in data provision workflow
|
2020-04-23 07:51:34 +02:00 |
Claudio Atzori
|
9ddafd46ca
|
fixed dedup record id prefix, set the correct dataInfo in the DedupRecordFactory
|
2020-04-23 07:50:18 +02:00 |
Claudio Atzori
|
ade4cb97af
|
fixed parameters passed to the postprocessing action in the workflow mapping the graph as hive DB
|
2020-04-22 18:24:06 +02:00 |
Sandro La Bruzzo
|
bb6c9785b4
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-22 15:00:57 +02:00 |
Sandro La Bruzzo
|
157915988c
|
improved crossref mapping
|
2020-04-22 15:00:44 +02:00 |
Enrico Ottonello
|
5977f08e92
|
merged
|
2020-04-22 14:50:50 +02:00 |
Enrico Ottonello
|
7d759947ae
|
used vtd for parsing orcid xml record, set 4g heapspace
|
2020-04-22 14:41:19 +02:00 |
Claudio Atzori
|
e81960335c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-22 10:46:37 +02:00 |
Michele Artini
|
9e4d58f505
|
ResultType
|
2020-04-22 10:07:26 +02:00 |
Claudio Atzori
|
c891661822
|
small adjustments in the graph2hive workflow
|
2020-04-21 18:52:23 +02:00 |
Miriam Baglioni
|
259525cb93
|
Merge remote-tracking branch 'upstream/master'
|
2020-04-21 18:33:46 +02:00 |
Miriam Baglioni
|
30e53261d0
|
minor
|
2020-04-21 18:00:53 +02:00 |
Claudio Atzori
|
0b55795d4d
|
small adjustments in the provisioning workflow
|
2020-04-21 16:15:04 +02:00 |
Claudio Atzori
|
88fbb3a353
|
added sparkSqlWarehouseDir to the default extra spark options passed to each workflow
|
2020-04-21 16:13:43 +02:00 |
Claudio Atzori
|
cd320efa96
|
added extra spark options to graph to hive workflow
|
2020-04-21 16:12:20 +02:00 |
Miriam Baglioni
|
90c768dde6
|
added shaded libs module
|
2020-04-21 16:03:51 +02:00 |
Claudio Atzori
|
91e72a6944
|
Dataset based implementation for SparkCreateDedupRecord phase, fixed datasource entity dump supplementing dedup unit tests
|
2020-04-21 12:06:08 +02:00 |
miconis
|
5c9ef08a8e
|
spark dedup test fixed
|
2020-04-21 10:19:04 +02:00 |
Sandro La Bruzzo
|
3624947a7f
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-21 08:34:24 +02:00 |
Claudio Atzori
|
d772d967aa
|
restored changes from master branch
|
2020-04-20 18:53:06 +02:00 |
Claudio Atzori
|
eb8a020859
|
fixed behaviour of DedupRecordFactory
|
2020-04-20 18:44:06 +02:00 |
Sandro La Bruzzo
|
039f9b7871
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-20 18:10:29 +02:00 |
Sandro La Bruzzo
|
e4b105cece
|
improved crossref mapping
|
2020-04-20 18:10:07 +02:00 |
Claudio Atzori
|
ede1af3d85
|
Merge branch 'master' into deduptesting
|
2020-04-20 16:52:14 +02:00 |
miconis
|
1102e32462
|
SparkDedupTest updated and organization dump fixed
|
2020-04-20 16:49:01 +02:00 |
Claudio Atzori
|
667d23c58b
|
finalising Actionset migration workflow
|
2020-04-20 16:45:21 +02:00 |
miconis
|
4da13e4570
|
Revert "Merge branch 'master' into deduptesting"
This reverts commit 772f75d167 , reversing
changes made to 5f45f2c77f .
|
2020-04-20 16:04:49 +02:00 |
Claudio Atzori
|
9147af7fed
|
actionsets migration workflow moved in dhp-workflows/dhp-actionmanager
|
2020-04-20 15:24:33 +02:00 |
miconis
|
772f75d167
|
Merge branch 'master' into deduptesting
|
2020-04-20 14:50:12 +02:00 |
Sandro La Bruzzo
|
5d46ec7d5f
|
fixed name of wrong package
|
2020-04-20 14:49:32 +02:00 |
Sandro La Bruzzo
|
82cc3b707d
|
fixed name of wrong package
|
2020-04-20 14:47:06 +02:00 |
Sandro La Bruzzo
|
b2c872cb4d
|
merged master
|
2020-04-20 14:04:40 +02:00 |
Sandro La Bruzzo
|
7029942e06
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-04-20 13:26:41 +02:00 |
Sandro La Bruzzo
|
0e45f4d450
|
continue mapping from crossref to OAF
|
2020-04-20 13:26:29 +02:00 |
Enrico Ottonello
|
a466648b4b
|
renamed output file
|
2020-04-20 12:32:03 +02:00 |
Claudio Atzori
|
d714bfb4d4
|
collectedfrom field moved in common parent class Oaf.java
|
2020-04-20 12:25:19 +02:00 |
Enrico Ottonello
|
4ae55e3891
|
added workflow parameters
|
2020-04-20 12:00:04 +02:00 |
Michele Artini
|
8ff7facfa3
|
fixed collectedFrom ID
|
2020-04-20 11:09:27 +02:00 |
Sandro La Bruzzo
|
eef60bb9f4
|
created structure of oozie wf for ORCID
|
2020-04-20 10:24:57 +02:00 |
Sandro La Bruzzo
|
4d0d9de07e
|
reorganized package and fixed test
|
2020-04-20 10:02:42 +02:00 |
Sandro La Bruzzo
|
618bc1fc72
|
first implementation of crossrefMapping
|
2020-04-20 09:53:34 +02:00 |
Michele Artini
|
25307965d2
|
add a default datainfo if missing
|
2020-04-20 09:43:27 +02:00 |
Michele Artini
|
d2058fdc47
|
tests
|
2020-04-20 09:31:14 +02:00 |
Enrico Ottonello
|
1d44a359ea
|
renamed package folder
|
2020-04-20 09:25:40 +02:00 |
Michele Artini
|
478a958f09
|
tests
|
2020-04-20 09:15:27 +02:00 |
Miriam Baglioni
|
e1848b7603
|
minor
|
2020-04-18 14:16:42 +02:00 |
Miriam Baglioni
|
0ff9b1ef05
|
added needed parameter
|
2020-04-18 14:16:29 +02:00 |
Miriam Baglioni
|
e2dfe8b656
|
removed not used action
|
2020-04-18 14:16:07 +02:00 |
Miriam Baglioni
|
437ebbad76
|
refactorign
|
2020-04-18 14:15:09 +02:00 |
Miriam Baglioni
|
9a8876ac86
|
added needed parameter
|
2020-04-18 14:14:08 +02:00 |
Miriam Baglioni
|
9854852878
|
refactoring
|
2020-04-18 14:13:16 +02:00 |
Miriam Baglioni
|
454b8a6a29
|
Merge remote-tracking branch 'upstream/master'
|
2020-04-18 14:09:44 +02:00 |
Miriam Baglioni
|
890ec28f0f
|
input parameters for preparation step1
|
2020-04-18 14:09:37 +02:00 |
Miriam Baglioni
|
fbf5c27c27
|
Added preparation classes before actual propagation
|
2020-04-18 14:09:03 +02:00 |
Claudio Atzori
|
5f45f2c77f
|
Merge branch 'master' into deduptesting
|
2020-04-18 12:46:40 +02:00 |
Claudio Atzori
|
ad7a131b18
|
introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project
|
2020-04-18 12:42:58 +02:00 |
Claudio Atzori
|
a2938dd059
|
cleanup
|
2020-04-18 12:24:22 +02:00 |
Claudio Atzori
|
9374ff03ea
|
Merge branch 'master' into deduptesting
|
2020-04-18 12:06:58 +02:00 |
Claudio Atzori
|
71813795f6
|
various refactorings on the dnet-dedup-openaire workflow
|
2020-04-18 12:06:23 +02:00 |
Enrico Ottonello
|
7011d4203e
|
parser of orcid summaries from tar gz file on hdfs, that creates a sequence file with authors informations (oid, name, surname, credit name)
|
2020-04-17 18:52:39 +02:00 |
miconis
|
6450bb0daa
|
test for softwares dedup added. definition of orp, dataset and sw dedup configurations
|
2020-04-17 17:31:59 +02:00 |
Miriam Baglioni
|
72c63a326e
|
removed unuseful class
|
2020-04-17 17:14:51 +02:00 |
Miriam Baglioni
|
00c2ca3ee5
|
-
|
2020-04-17 17:14:25 +02:00 |
Miriam Baglioni
|
5cd092114f
|
use mergeFrom method to add the new community contexts
|
2020-04-17 17:13:18 +02:00 |
Miriam Baglioni
|
264c82f21e
|
minor
|
2020-04-17 16:54:46 +02:00 |
Miriam Baglioni
|
8c079c7a49
|
unit test for orcid to result propagation from semrel
|
2020-04-17 16:53:03 +02:00 |
Miriam Baglioni
|
eacd140a98
|
added missing parameter(s)
|
2020-04-17 16:52:30 +02:00 |
Miriam Baglioni
|
390e250faf
|
use the addPid method of the Author class to add a new pid
|
2020-04-17 16:52:02 +02:00 |
Miriam Baglioni
|
b46b080ddc
|
use mergeFrom method call to add the country(ies) instead of modify the result directly.
|
2020-04-17 16:50:54 +02:00 |
Miriam Baglioni
|
c4987dd12a
|
minor
|
2020-04-17 16:49:08 +02:00 |
Claudio Atzori
|
038ac7afd7
|
relation consistency workflow separated from dedup scan and creation of CCs
|
2020-04-17 13:12:44 +02:00 |
Claudio Atzori
|
c92bfeeaee
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-17 13:07:52 +02:00 |
Miriam Baglioni
|
adc11c97a7
|
Merge remote-tracking branch 'upstream/master'
|
2020-04-17 12:34:31 +02:00 |
Sandro La Bruzzo
|
a329ea5575
|
merged with master branch
|
2020-04-17 12:23:54 +02:00 |
Sandro La Bruzzo
|
01ea7721f3
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-17 12:12:25 +02:00 |
Sandro La Bruzzo
|
5e2fa996aa
|
fixed problem with conversion of long into string
|
2020-04-17 12:11:51 +02:00 |
miconis
|
418cf94642
|
implementation of the deletedbyinference test in propagating relations
|
2020-04-17 10:40:21 +02:00 |
Miriam Baglioni
|
5d772e5263
|
new implementation of propagation of community to result from organization that exploits the prepared info
|
2020-04-16 18:45:22 +02:00 |
Miriam Baglioni
|
fff1e5ec39
|
classes to (de)serialize the data provided in the preparation step
|
2020-04-16 18:44:43 +02:00 |
Miriam Baglioni
|
3fd9d6b02f
|
preparation phase for the propagation of community to result from organization
|
2020-04-16 18:43:55 +02:00 |
Miriam Baglioni
|
a9120164aa
|
added hive parameter and a step of reset of the working dir in the workflow
|
2020-04-16 18:42:04 +02:00 |
Miriam Baglioni
|
6afbd542ca
|
changed the save mode to avoid NegativeArraySize... error. Needed to modify also the preparationstep2
|
2020-04-16 18:40:14 +02:00 |
Miriam Baglioni
|
d60fd36046
|
changed the save method
|
2020-04-16 16:14:15 +02:00 |
Miriam Baglioni
|
951b13ac46
|
input parameters and workflow for new implementation of propagation of orcid to result from semrel and preparation phases
|
2020-04-16 16:13:10 +02:00 |
Miriam Baglioni
|
4d89f3dfed
|
removed unuseful classes
|
2020-04-16 16:11:44 +02:00 |
Miriam Baglioni
|
5e72a51f11
|
-
|
2020-04-16 16:11:20 +02:00 |
Miriam Baglioni
|
c33a593381
|
renamed
|
2020-04-16 16:09:47 +02:00 |
Miriam Baglioni
|
0e5399bf74
|
seconf phase of data preparation. Groups all the possible updates by id
|
2020-04-16 16:08:51 +02:00 |
Miriam Baglioni
|
548ba915ac
|
first phase of data preparation. For each result type (parallel) it produces the possible updates
|
2020-04-16 15:58:42 +02:00 |
Miriam Baglioni
|
243013cea3
|
to (de)serialize the association from the resultId and the list of autoritative authors with orcid to possibly propagate
|
2020-04-16 15:57:29 +02:00 |
Miriam Baglioni
|
ac3ad25b36
|
to (de)serialize needed information of the author to determine if the orcid can be passed (name, surname, fullname (?), orcid)
|
2020-04-16 15:56:33 +02:00 |
Miriam Baglioni
|
d6cd700a32
|
new implementation that exploits prepared information (the list of possible updates: resultId - possible list of orcid to be added
|
2020-04-16 15:55:25 +02:00 |
Miriam Baglioni
|
f077f22f73
|
minor
|
2020-04-16 15:54:16 +02:00 |
Miriam Baglioni
|
fd5d792e35
|
refactoring
|
2020-04-16 15:53:34 +02:00 |
Claudio Atzori
|
cb0952428e
|
Merge branch 'master' into deduptesting
|
2020-04-16 14:42:25 +02:00 |
Claudio Atzori
|
cc21bbfb1a
|
Merge branch 'deduptesting' of https://code-repo.d4science.org/D-Net/dnet-hadoop into deduptesting
|
2020-04-16 14:41:37 +02:00 |
Claudio Atzori
|
ec5dfc068d
|
added spark.sql.shuffle.partitions=3840 to dedup scan wf
|
2020-04-16 14:41:28 +02:00 |
Claudio Atzori
|
09f356b047
|
Merge pull request 'Closes #7: subdirs inside graph table dirs' (#8) from przemyslaw.jacewicz/dnet-hadoop:przemyslawjacewicz_7_distcp_configuration_fix into master
Run the code from this PR in isolation and it worked fine. Thanks!
|
2020-04-16 14:38:46 +02:00 |
Claudio Atzori
|
3437383112
|
Merge branch 'master' into deduptesting
|
2020-04-16 12:46:14 +02:00 |
miconis
|
0eccbc318b
|
Deduper class (utilities for dedup) cleaned. Useless methods removed
|
2020-04-16 12:36:37 +02:00 |
Claudio Atzori
|
76d23895e6
|
Merge branch 'deduptesting' of https://code-repo.d4science.org/D-Net/dnet-hadoop into deduptesting
|
2020-04-16 12:18:32 +02:00 |
miconis
|
6a089ec287
|
minor changes
|
2020-04-16 12:15:38 +02:00 |
Claudio Atzori
|
376efd67de
|
removed prepare statement in spark action
|
2020-04-16 12:14:16 +02:00 |
miconis
|
9b36458b6a
|
Merge branch 'deduptesting' of code-repo.d4science.org:D-Net/dnet-hadoop into deduptesting
|
2020-04-16 12:13:58 +02:00 |
miconis
|
cd4d9a148f
|
creating temporary directories in dedup test
|
2020-04-16 12:13:26 +02:00 |
Claudio Atzori
|
b39ff36c16
|
improving the wf definitions
|
2020-04-16 12:11:37 +02:00 |
Claudio Atzori
|
011b342bc9
|
trying to avoid OOM in SparkPropagateRelation
|
2020-04-16 11:13:51 +02:00 |
Miriam Baglioni
|
08227cfcbd
|
resources needed for running the test on propagation of result to organization from institutional repositories
|
2020-04-16 11:06:10 +02:00 |
Miriam Baglioni
|
a97e915c24
|
test unit for propagation of result to organization from institutional repository
|
2020-04-16 11:05:21 +02:00 |
Miriam Baglioni
|
b078710924
|
modification to the test due to the removal of unused parameters
|
2020-04-16 11:04:39 +02:00 |
Miriam Baglioni
|
a5e5c81a2c
|
input parameters and workflow definition for propagation of result to organization from institutional repositories
|
2020-04-16 11:03:41 +02:00 |
Miriam Baglioni
|
5e1bd67680
|
removed unuseful parameter
|
2020-04-16 11:02:01 +02:00 |
Miriam Baglioni
|
eaf19ce01b
|
removed unuseful class
|
2020-04-16 10:59:33 +02:00 |
Miriam Baglioni
|
7bd49abbef
|
commit to delete
|
2020-04-16 10:59:09 +02:00 |
Miriam Baglioni
|
53f418098b
|
added the isTest checkpoint
|
2020-04-16 10:53:48 +02:00 |
Miriam Baglioni
|
c28333d43f
|
minor
|
2020-04-16 10:52:50 +02:00 |
Miriam Baglioni
|
a8100baed6
|
changed the way to save the results to aviod NegativeArray... error
|
2020-04-16 10:50:09 +02:00 |
Miriam Baglioni
|
79b978ec57
|
refactoring
|
2020-04-16 10:48:41 +02:00 |
Claudio Atzori
|
069ef5eaed
|
trying to avoid OOM in SparkPropagateRelation
|
2020-04-15 21:23:21 +02:00 |
Claudio Atzori
|
8eedfefc98
|
try to introduce intermediate serialization on hdfs to avoid OOM
|
2020-04-15 18:35:35 +02:00 |
Przemysław Jacewicz
|
da019495d7
|
[dhp-actionmanager] target dir removal added for distcp actions
|
2020-04-15 17:56:57 +02:00 |
miconis
|
5689d49689
|
minor changes
|
2020-04-15 16:34:06 +02:00 |
Claudio Atzori
|
c439d0c6bb
|
PromoteActionPayloadForGraphTableJob reads directly the content pointed by the input path, adjusted promote action tests (ISLookup mock)
|
2020-04-15 16:18:33 +02:00 |
Claudio Atzori
|
ff30f99c65
|
using newline delimited json files for the raw graph materialization. Introduced contentPath parameter
|
2020-04-15 16:16:20 +02:00 |
Sandro La Bruzzo
|
3d3ac76dda
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-15 15:24:01 +02:00 |
Sandro La Bruzzo
|
74a7fac774
|
fixed problem with timestamp
|
2020-04-15 15:23:54 +02:00 |
Miriam Baglioni
|
3577219127
|
removed unuseful classes
|
2020-04-15 12:45:49 +02:00 |
Miriam Baglioni
|
964b22d418
|
modified the writing of the new relations. before: read old rels, add the new ones to them, write all the relations in new location. Now: first step of the wf copies the old relation i new location. If new relations are found, they are saved in the new location in append mode.
|
2020-04-15 12:32:01 +02:00 |
Miriam Baglioni
|
43f0590d4b
|
change in the testing because the business logic is changed.
|
2020-04-15 12:29:50 +02:00 |
Miriam Baglioni
|
473d17767c
|
new business logic for the actual propagation. It exploits previously computed information
|
2020-04-15 12:25:44 +02:00 |
Miriam Baglioni
|
6a377a7582
|
class to compute some information needed for the actual propagation
|
2020-04-15 12:25:11 +02:00 |
Miriam Baglioni
|
5a3487280d
|
classes to serialize/deserialize the prepared data
|
2020-04-15 12:24:36 +02:00 |
Miriam Baglioni
|
62b09be43c
|
added correct descritption for parameter isSparkSessionManaged
|
2020-04-15 12:23:06 +02:00 |
Miriam Baglioni
|
1859ce8902
|
minor refactoring
|
2020-04-15 12:21:31 +02:00 |
Miriam Baglioni
|
27f1d3ee8f
|
minor refactoring
|
2020-04-15 12:21:05 +02:00 |
Alessia Bardi
|
550a9f82ed
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-14 17:53:01 +02:00 |
Alessia Bardi
|
a68fae9bcb
|
now supporting openaire 4.0 compliance
|
2020-04-14 17:52:48 +02:00 |
Sandro La Bruzzo
|
c36239e693
|
fixed incremental indexing
|
2020-04-14 17:47:36 +02:00 |
Miriam Baglioni
|
3f4b579e7f
|
new workflow. It is composed of four steps. The first removes the directory where to store the results. The second copies the relation to the new locatio, the third id the preparation phase and then the actual propagation
|
2020-04-14 16:49:24 +02:00 |
Miriam Baglioni
|
ca2b40952e
|
minor changes
|
2020-04-14 16:48:02 +02:00 |
Miriam Baglioni
|
61d39e659e
|
parameters for the project2result propagation phase
|
2020-04-14 16:47:39 +02:00 |
Miriam Baglioni
|
92f19fa0a0
|
parameters for the project2result preparation phase
|
2020-04-14 16:46:57 +02:00 |
Miriam Baglioni
|
cadab9b81d
|
new implementation for result to project propagation. Use the prepared info in propagation
|
2020-04-14 16:46:07 +02:00 |
Miriam Baglioni
|
ceb1f299bf
|
minor changes
|
2020-04-14 16:45:12 +02:00 |
Claudio Atzori
|
82e8341f50
|
reorganizing parameter names in the provision workflow
|
2020-04-14 15:54:41 +02:00 |
Miriam Baglioni
|
e0038bde5b
|
Support class to serialize/deserialize the association project, set of linked results
|
2020-04-14 15:32:12 +02:00 |
Miriam Baglioni
|
c0bebb7c35
|
code to compute the prepared information used in the actual propagation step. This step will produce who files: one with potential updates (association between projects and a list of results), the other already linked entities (association between projects and the list of results already linked to them)
|
2020-04-14 15:31:26 +02:00 |
Miriam Baglioni
|
f47ee5b78e
|
directory where to store the prepared infor before actual propagation will take place
|
2020-04-14 15:29:21 +02:00 |
Miriam Baglioni
|
36cc9516d8
|
the starting relation set for testing
|
2020-04-14 15:28:34 +02:00 |
Miriam Baglioni
|
4b01dc60e6
|
test unit for result to project propagation
|
2020-04-14 15:28:00 +02:00 |
Miriam Baglioni
|
8f12292daa
|
changed the way to save the results on filesystem
|
2020-04-11 16:47:34 +02:00 |
Miriam Baglioni
|
87f802821e
|
new workflow for country propagation: it is composed of the preparation step and in the propagation. The propagation part runs in parallel on the result types
|
2020-04-11 16:40:22 +02:00 |
Miriam Baglioni
|
a562080b0b
|
parameters to be used in the prepared Job and in the actual country propagation job
|
2020-04-11 16:39:17 +02:00 |
Miriam Baglioni
|
1251ad4455
|
removed unuseful class
|
2020-04-11 16:38:13 +02:00 |
Miriam Baglioni
|
aef9b3aa90
|
new parametric implementation of country propagation. Exploits information compute before and broadcasts it to each executor
|
2020-04-11 16:36:59 +02:00 |
Miriam Baglioni
|
a2d833d5dd
|
step of data preparation before actual country propagation will take palce
|
2020-04-11 16:36:03 +02:00 |
Miriam Baglioni
|
6897c920a2
|
classes in support of new implementation of country propagation
|
2020-04-11 16:35:26 +02:00 |
Miriam Baglioni
|
85766a02d8
|
added dependency to use hive on local machine
|
2020-04-11 16:34:22 +02:00 |
Miriam Baglioni
|
79b8ea4fed
|
prepared information to be used in actual country propagation. Subset of info
|
2020-04-11 16:29:41 +02:00 |
Miriam Baglioni
|
1822476613
|
Test for country propagation
|
2020-04-11 16:28:09 +02:00 |
Miriam Baglioni
|
7783b09c5b
|
new implementation for result to project propagation. Prepare some info to be used in propagation
|
2020-04-11 16:26:23 +02:00 |
Claudio Atzori
|
6b5f9ca9cb
|
raw graph creation workflow moved under dhp-graph-mapper, claims integration is included
|
2020-04-10 17:53:07 +02:00 |
Miriam Baglioni
|
90469789b9
|
two new classes fro new implementation of project to result propagation
|
2020-04-09 13:29:01 +02:00 |
Miriam Baglioni
|
627ad58a8b
|
new wf definition
|
2020-04-09 11:33:19 +02:00 |
Miriam Baglioni
|
9c63c4840d
|
new workflow and parameters for country propagation
|
2020-04-08 19:13:42 +02:00 |
Miriam Baglioni
|
a2d309545b
|
new parametrized implementation for country propagation
|
2020-04-08 19:12:59 +02:00 |
Miriam Baglioni
|
6dfdba9ef7
|
new parametrized implementation for country propagation
|
2020-04-08 18:14:37 +02:00 |
Miriam Baglioni
|
03f7cb6402
|
new parametrized implementation for country propagation
|
2020-04-08 18:08:41 +02:00 |
Miriam Baglioni
|
df2fc4a6d7
|
Merge remote-tracking branch 'upstream/master'
|
2020-04-08 18:07:26 +02:00 |
Miriam Baglioni
|
fcfef4632f
|
input parameters for country propagation preparation job
|
2020-04-08 18:07:18 +02:00 |
miconis
|
0be2e72be5
|
further implementation of tests for the deduplication of each entity. publication dump added, empty entity files created
|
2020-04-08 18:02:30 +02:00 |
Miriam Baglioni
|
61045e84d9
|
merged conflict in pom
|
2020-04-08 14:23:30 +02:00 |
Claudio Atzori
|
47f3d9b757
|
unit test for GraphHiveImporterJob
|
2020-04-08 13:24:43 +02:00 |
Sandro La Bruzzo
|
ba9f07a6fe
|
fixed wrong test
|
2020-04-08 13:18:20 +02:00 |
Miriam Baglioni
|
540da4ab61
|
new busuness logic with prepared info before actual job run
|
2020-04-08 13:04:04 +02:00 |
Miriam Baglioni
|
8438702b3d
|
addition in propagation constants
|
2020-04-08 10:54:01 +02:00 |
Miriam Baglioni
|
2afe971816
|
new implementation for country propagatio
|
2020-04-08 10:49:09 +02:00 |
Miriam Baglioni
|
beebbcf66b
|
new config for countrypropagation
|
2020-04-08 10:31:29 +02:00 |
Claudio Atzori
|
d74e128aa6
|
Utility classes moved in dhp-common and dhp-schemas
|
2020-04-07 11:56:22 +02:00 |
Claudio Atzori
|
c57cf679ca
|
Merge branch 'provision_dataset'
|
2020-04-07 08:56:58 +02:00 |
Claudio Atzori
|
1a1a026a18
|
we do expect to find field bestaccessright already defined. No need to add it again
|
2020-04-07 08:55:33 +02:00 |
Claudio Atzori
|
fbdd18a96b
|
using dataset based relation preparation procedure
|
2020-04-07 08:54:39 +02:00 |
Claudio Atzori
|
77f59b1b10
|
dataset based provision WIP
|
2020-04-06 19:37:27 +02:00 |
Claudio Atzori
|
6177cf36fb
|
Merge pull request 'Closes #4: New action manager implementation' (#5) from przemyslaw.jacewicz/dnet-hadoop:przemyslawjacewicz_actionmanager_impl_prototype into master
Nothing more to add here. Thanks for your contribution!
|
2020-04-06 17:35:07 +02:00 |
Claudio Atzori
|
e355961997
|
dataset based provision WIP
|
2020-04-06 17:34:25 +02:00 |
miconis
|
56fbe689f0
|
implementation of the tests for each spark action
|
2020-04-06 16:30:31 +02:00 |
Claudio Atzori
|
ca345aaad3
|
dataset based provision WIP
|
2020-04-06 15:33:31 +02:00 |
Claudio Atzori
|
c8f4b95464
|
dataset based provision WIP
|
2020-04-06 08:59:58 +02:00 |
Claudio Atzori
|
eb2f5f3198
|
dataset based provision WIP
|
2020-04-04 17:41:31 +02:00 |
Claudio Atzori
|
3d1b637cab
|
dataset based provision WIP
|
2020-04-04 14:03:43 +02:00 |
miconis
|
53fd624c34
|
implemented test for sparkcreatesimrels
|
2020-04-03 18:32:25 +02:00 |
Claudio Atzori
|
24b2c9012e
|
dataset based provision WIP
|
2020-04-02 18:44:09 +02:00 |
miconis
|
a61763d149
|
structure for sparksimrel changed to be compliant with mockito testing
|
2020-04-02 18:37:53 +02:00 |
Claudio Atzori
|
daa26acc9d
|
dataset based provision WIP, fixed spark2EventLogDir
|
2020-04-02 16:15:50 +02:00 |
Przemysław Jacewicz
|
7b2a7e2417
|
[dhp-actionmanager] missing descriptions added and minor naming and formatting fixes
|
2020-04-02 11:48:40 +02:00 |
Spyros Zoupanos
|
1ab97bbe00
|
Adding the full stats workflow to the dnet-hadoop hierarchy
|
2020-04-01 22:22:05 +03:00 |
Claudio Atzori
|
9c7092416a
|
dataset based provision WIP
|
2020-04-01 19:07:30 +02:00 |
miconis
|
bfa5bc74df
|
minor changes
|
2020-04-01 19:05:48 +02:00 |
Przemysław Jacewicz
|
80cf43b9c8
|
[dhp-actionmanager] promoting workflow added
|
2020-04-01 18:51:25 +02:00 |
Przemysław Jacewicz
|
5b459bcc47
|
[dhp-actionmanager] promoting spark job added
|
2020-04-01 18:49:08 +02:00 |
miconis
|
9802bcb9fe
|
dedup testing
|
2020-04-01 18:48:31 +02:00 |
Przemysław Jacewicz
|
e21bb89dbd
|
[dhp-actionmanager] partitioning spark job added
|
2020-04-01 18:41:29 +02:00 |
Przemysław Jacewicz
|
f9f7350bb9
|
[dhp-actionmanager] common package added with utility classes supporting hadoop and spark envs
|
2020-04-01 18:39:26 +02:00 |
Przemysław Jacewicz
|
ad70c23b2e
|
[dhp-actionmanager] pom updated
|
2020-04-01 18:36:00 +02:00 |
Przemysław Jacewicz
|
4e910a78d4
|
[dhp-workflows] spark 2 connection properties added
|
2020-04-01 18:29:26 +02:00 |
Claudio Atzori
|
1402eb1fe7
|
cleanup
|
2020-04-01 15:38:50 +02:00 |
Claudio Atzori
|
7061d07727
|
ActionSets migration serialize the output as plain text files instead of SequenceFiles
|
2020-04-01 14:58:22 +02:00 |
Claudio Atzori
|
adcdd2d05e
|
WIP: reimplementing the adjacency list construction process using spark Datasets
|
2020-04-01 14:56:57 +02:00 |
Sandro La Bruzzo
|
205e9521c6
|
implemented import crossref job
|
2020-04-01 14:12:33 +02:00 |
Sandro La Bruzzo
|
201d79021e
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-31 14:41:41 +02:00 |
Sandro La Bruzzo
|
cd7416ae4c
|
first implementation of incremental update of scholix index
|
2020-03-31 14:41:35 +02:00 |
przemek
|
9d1d18d4b9
|
Merge branch 'master' into przemyslawjacewicz_actionmanager_impl_prototype
|
2020-03-31 12:04:58 +02:00 |
Claudio Atzori
|
377e1ba840
|
[maven-release-plugin] prepare for next development iteration
|
2020-03-30 20:06:00 +02:00 |
Claudio Atzori
|
76d9315129
|
[maven-release-plugin] prepare release dhp-1.1.6
|
2020-03-30 20:05:56 +02:00 |
Claudio Atzori
|
ef429010ee
|
removed log file and job-override.properties
|
2020-03-30 20:00:58 +02:00 |
Claudio Atzori
|
0fbec69b82
|
use oozie prepare statement to cleanup working directories
|
2020-03-30 19:48:41 +02:00 |
Claudio Atzori
|
3af2b8d700
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-30 13:12:21 +02:00 |
Claudio Atzori
|
f3f9affd49
|
allow dynamic executors to build XML records
|
2020-03-30 13:12:11 +02:00 |
Claudio Atzori
|
2e2d4c4c68
|
adjusted path to template resource
|
2020-03-30 13:11:49 +02:00 |
Miriam Baglioni
|
dd011f4a95
|
to make them visible to Claudio
|
2020-03-30 10:55:47 +02:00 |
Miriam Baglioni
|
b1af90a45f
|
to make it visible to Claudio
|
2020-03-30 10:50:03 +02:00 |
Sandro La Bruzzo
|
62cc257e5c
|
fixed step1 workflow
|
2020-03-27 17:07:34 +01:00 |
Sandro La Bruzzo
|
1a7a866861
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-27 15:11:48 +01:00 |
Sandro La Bruzzo
|
7cef698f36
|
reformat code
|
2020-03-27 15:11:34 +01:00 |
Claudio Atzori
|
1767dfaa3f
|
method can be protected, it is meant to be used only in tests
|
2020-03-27 14:31:26 +01:00 |
Sandro La Bruzzo
|
a4b6a51168
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-27 13:48:56 +01:00 |
Sandro La Bruzzo
|
15d9106b3f
|
FIxed merge of dhp dedup
|
2020-03-27 13:48:44 +01:00 |
Claudio Atzori
|
e196fff212
|
adjusted path for source resource in unit test
|
2020-03-27 13:45:10 +01:00 |
Sandro La Bruzzo
|
8c9a56a0c8
|
refactored package name
|
2020-03-27 13:19:33 +01:00 |
Sandro La Bruzzo
|
2bd2d6f202
|
Merge branch 'master' of code-repo.d3science.org:D-Net/dnet-hadoop
|
2020-03-27 13:16:36 +01:00 |
Sandro La Bruzzo
|
a9935f80d4
|
refactor class name and workflow name for graph mapper, added javadoc
|
2020-03-27 13:16:24 +01:00 |
Michele Artini
|
ae03948eed
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-27 11:47:07 +01:00 |
Michele Artini
|
f6e86b44a6
|
tests
|
2020-03-27 11:46:37 +01:00 |
Michele Artini
|
408be3c632
|
test and fixed a problem with datacite namespaces
|
2020-03-27 11:44:50 +01:00 |
Claudio Atzori
|
673e744649
|
moved openaire specific implementations under dedicated package eu.dnetlib.dhp.oa
|
2020-03-27 10:42:17 +01:00 |
Claudio Atzori
|
098fabab3f
|
reorganizing content under dhp-workflows/dhp-graph-mapper
|
2020-03-26 19:44:19 +01:00 |
Claudio Atzori
|
77c4294924
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-26 18:26:52 +01:00 |
Claudio Atzori
|
43cbcda7ef
|
unit test for SparkGraphImporterJob
|
2020-03-26 18:26:40 +01:00 |
Sandro La Bruzzo
|
e04da6d66a
|
merged all oozie wf in one
|
2020-03-26 14:17:07 +01:00 |
Sandro La Bruzzo
|
e71e001b58
|
commented test that doesn't work
|
2020-03-26 14:15:21 +01:00 |
Sandro La Bruzzo
|
0cd022ad6a
|
merge with master
|
2020-03-26 14:08:29 +01:00 |
Claudio Atzori
|
abcd3f5bf5
|
added sample data for unit tests
|
2020-03-26 11:12:52 +01:00 |
Sandro La Bruzzo
|
d5f11e27be
|
renamed wf
|
2020-03-26 09:49:23 +01:00 |
Sandro La Bruzzo
|
9a37ad0127
|
renamed modules
|
2020-03-26 09:46:46 +01:00 |
Sandro La Bruzzo
|
a768226e52
|
updated generate scholix to generate json
|
2020-03-26 09:40:50 +01:00 |
Claudio Atzori
|
9dff4adbc3
|
dhp-graph-mapper workflow tests upgraded to junit5
|
2020-03-25 18:25:12 +01:00 |
Claudio Atzori
|
cd7dc3e1ae
|
dhp-dedup-openaire workflow tests upgraded to junit5
|
2020-03-25 18:04:23 +01:00 |
Claudio Atzori
|
c0e825e713
|
dhp-aggregation workflow tests upgraded to junit5
|
2020-03-25 17:59:45 +01:00 |
Michele Artini
|
ebe45003d9
|
fixed some junit packages
|
2020-03-25 16:45:03 +01:00 |
Michele Artini
|
d9bfdcd607
|
updated poms
|
2020-03-25 16:31:12 +01:00 |
Michele Artini
|
120e823cd1
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-25 16:00:10 +01:00 |
Claudio Atzori
|
71ae7dd272
|
renamed module dnet-dedup to dnet-dedup-openaire
|
2020-03-25 15:57:09 +01:00 |
Michele Artini
|
fd57722c69
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-25 15:56:49 +01:00 |
Claudio Atzori
|
f441f823dd
|
fixed path referencing a test resource file
|
2020-03-25 15:21:46 +01:00 |
Claudio Atzori
|
51d0c9bdd7
|
integrated changes from branch dedupTest
|
2020-03-25 15:15:41 +01:00 |
Claudio Atzori
|
36f8f2ea66
|
master set to 'yarn' in spark actions, removed path to rawSet from the dedup scan workflow
|
2020-03-25 14:16:06 +01:00 |
Michele Artini
|
2559299da4
|
tests
|
2020-03-25 12:25:00 +01:00 |
Claudio Atzori
|
2180cc4fe7
|
more fields included in result view definition
|
2020-03-25 11:21:46 +01:00 |
Claudio Atzori
|
efb0b7d660
|
master set to 'yarn' in spark actions
|
2020-03-25 11:15:35 +01:00 |
Michele Artini
|
0fda2c3a30
|
some tests on db records
|
2020-03-25 09:43:58 +01:00 |
miconis
|
02320de371
|
minor changes
|
2020-03-24 17:43:51 +01:00 |
miconis
|
8e8b5e8f30
|
roots wf merged in scan wf
|
2020-03-24 17:40:58 +01:00 |
Miriam Baglioni
|
19d7f8b51d
|
decommented execution for some of the result type for testing purposes
|
2020-03-24 16:49:46 +01:00 |
Miriam Baglioni
|
ad24c8478f
|
added missing parameter
|
2020-03-24 16:19:59 +01:00 |
Miriam Baglioni
|
46094a3eec
|
bug fixing for implementation with dataset
|
2020-03-24 16:19:36 +01:00 |
Claudio Atzori
|
51ff68db66
|
Merge branch 'dedupTest' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dedupTest
|
2020-03-24 11:18:19 +01:00 |
Claudio Atzori
|
1e869e7bed
|
using method available from currently used library
|
2020-03-24 11:17:44 +01:00 |
miconis
|
f0d72b76a8
|
package structure fixed
|
2020-03-24 10:51:40 +01:00 |
Claudio Atzori
|
aaedbb1b8b
|
WIP: dedup workflow, stage 2
|
2020-03-24 09:59:28 +01:00 |
Michele Artini
|
e3760c7f39
|
fix a bug with organization countries
|
2020-03-24 08:43:56 +01:00 |
Claudio Atzori
|
8b0ba3d76a
|
posprocessing script correctly run as hive2 action
|
2020-03-23 17:40:39 +01:00 |
miconis
|
93e2291291
|
minor changes
|
2020-03-23 17:17:56 +01:00 |
miconis
|
f7890a90df
|
implementation of the mechanism that checks the existance of a mergerel file
|
2020-03-23 17:13:30 +01:00 |
Miriam Baglioni
|
ad712f2d79
|
added the needed variables in the config and read the variables in the workflow
|
2020-03-23 17:11:36 +01:00 |
Miriam Baglioni
|
f1e9fe9752
|
changed implementation using dataset and query on hive
|
2020-03-23 17:11:00 +01:00 |
Miriam Baglioni
|
f09cd1e911
|
removed unuseful variable in the configuration
|
2020-03-23 17:10:14 +01:00 |
Miriam Baglioni
|
9418e3d4fa
|
read dataset from files instead of using hive tables
|
2020-03-23 17:09:27 +01:00 |
Miriam Baglioni
|
a7bf037306
|
remove unused class
|
2020-03-23 14:36:43 +01:00 |
Miriam Baglioni
|
8ab8b6b0bf
|
minor
|
2020-03-23 14:35:23 +01:00 |
Miriam Baglioni
|
30d58fd98c
|
change the configuration of the workflow
|
2020-03-23 14:32:49 +01:00 |
Miriam Baglioni
|
a440152b46
|
refactoring
|
2020-03-23 14:30:56 +01:00 |
Miriam Baglioni
|
47561f3597
|
changed the implementation from rdd to dataset got from sql queries (on hive)
|
2020-03-23 11:58:32 +01:00 |
miconis
|
c20e179f5a
|
structure of the workflows updated
|
2020-03-23 11:43:49 +01:00 |
Claudio Atzori
|
658d40ccbe
|
WIP trying to use hive2 actions
|
2020-03-23 11:14:54 +01:00 |
Claudio Atzori
|
ecb64e4998
|
Merge branch 'migration_wfs_regular_all_steps'
|
2020-03-23 08:57:01 +01:00 |
Michele Artini
|
15160032bd
|
fixed a bug setting some organization fields
|
2020-03-23 08:39:14 +01:00 |
Claudio Atzori
|
a4c52661a0
|
WIP: fixing dedup workflows
|
2020-03-20 19:17:24 +01:00 |
Claudio Atzori
|
6cb0a9bff0
|
dedup wf directory structure aligned with project commons
|
2020-03-20 16:48:14 +01:00 |
miconis
|
e16e644faf
|
implementation of the workflow for entity update and for relations update
|
2020-03-20 13:01:56 +01:00 |
przemek
|
638b78f96a
|
Merge remote-tracking branch 'origin/master' into przemyslawjacewicz_actionmanager_impl_prototype
|
2020-03-19 15:12:56 +01:00 |
miconis
|
4e82a24af2
|
minor changes and implementation of the create connected components action
|
2020-03-19 15:01:07 +01:00 |
Claudio Atzori
|
36236dd1c1
|
action migration workflow produces eu.dnetlib.dhp.schema.action.AtomicAction(s)
|
2020-03-19 14:00:38 +01:00 |
Claudio Atzori
|
a0ab15a64c
|
need to stick on using guava:11.0.2 as it is the version used by the hadoop components (oozie client for sure). The last version (28.2-jre) breaks the oozie workflow submission
|
2020-03-19 13:58:58 +01:00 |
Sandro La Bruzzo
|
0594b92a6d
|
implemented relation with dataset
|
2020-03-19 11:11:07 +01:00 |
miconis
|
679b5869e5
|
implementation of the lookup procedure to take dedup conf from the resource profiles
|
2020-03-18 17:41:56 +01:00 |
Claudio Atzori
|
abe8fb69a2
|
added global properties, moved postprocessing script inside the oozie_app directory
|
2020-03-18 15:43:54 +01:00 |
miconis
|
f32eae5ce9
|
implementation of the spark action for the simrel creation
|
2020-03-18 14:27:49 +01:00 |
Claudio Atzori
|
c7e0730720
|
compress the output produced by migration steps 1 and 2
|
2020-03-18 09:34:57 +01:00 |
Claudio Atzori
|
2f11e37602
|
fixed expansion of path variables
|
2020-03-17 19:41:07 +01:00 |
Claudio Atzori
|
2795b0b096
|
no need to mkdir a the all_entities file
|
2020-03-17 17:22:14 +01:00 |
Claudio Atzori
|
19746ad308
|
when reuseContent, reset ${workingPath}/all_entities
|
2020-03-17 17:17:06 +01:00 |
Claudio Atzori
|
2f0c85eeb3
|
updated parameters for regular_all_steps worfklow, introduced flag 'reuseContent'
|
2020-03-17 17:04:58 +01:00 |
Miriam Baglioni
|
67ea3cf3ed
|
changed the way to read the file with info on resource or relation. From sequenceFile to textFile
|
2020-03-17 16:32:05 +01:00 |
Miriam Baglioni
|
b4652d018c
|
moved the creation of new dir to common class.
|
2020-03-17 16:31:24 +01:00 |
Claudio Atzori
|
b8290b5851
|
updated parameters for regular_all_steps worfklow
|
2020-03-17 15:45:30 +01:00 |
Claudio Atzori
|
4706f24ec5
|
updated parameters for regular_all_steps worfklow
|
2020-03-17 15:23:54 +01:00 |
Claudio Atzori
|
aeb01fa353
|
reading from newline delimited json textfiles instead of sequence files
|
2020-03-17 11:57:24 +01:00 |
Miriam Baglioni
|
92f4e0001d
|
Merge branch 'bulktag'
|
2020-03-16 13:33:27 +01:00 |
Miriam Baglioni
|
ab08a37024
|
Merge remote-tracking branch 'upstream/master'
|
2020-03-16 12:45:23 +01:00 |
Claudio Atzori
|
af835f2f98
|
when migrating actionsets from DM cluster, populate the AtomicAction.targetValue when empty (dedup similarities)
|
2020-03-15 18:07:59 +01:00 |
Claudio Atzori
|
9c84e21b87
|
added workflow to migrate latest version of each actionset content from DM to OCEAN cluster, mapping the targetValues from the old protobuf data model to the dhp.OAF datamodel
|
2020-03-13 15:56:52 +01:00 |
Claudio Atzori
|
8fe7ae1482
|
xml formatting
|
2020-03-13 15:53:56 +01:00 |
Przemysław Jacewicz
|
d0c9b0cdd6
|
WIP promote job functions updated
|
2020-03-13 12:36:42 +01:00 |
Przemysław Jacewicz
|
8d9b3c5de2
|
WIP action payload mapping into OAF type moved, (local) graph table name enum created, tests fixed
|
2020-03-13 10:01:39 +01:00 |
Przemysław Jacewicz
|
5cc560c7e5
|
Removed unnecessary dependency on old OAF model
|
2020-03-13 09:57:46 +01:00 |
Sandro La Bruzzo
|
addaaa091f
|
migrate relation from RDD to Dataset
|
2020-03-13 09:13:20 +01:00 |
Przemysław Jacewicz
|
3f24593e51
|
WIP: promote job tests and test resources implementation snapshot
|
2020-03-11 17:06:29 +01:00 |
Przemysław Jacewicz
|
2e996d610f
|
WIP: promote job functions implementation snapshot
|
2020-03-11 17:02:57 +01:00 |
Przemysław Jacewicz
|
cc63cdc9e6
|
WIP: promote job implementation snapshot
|
2020-03-11 17:02:06 +01:00 |
Przemysław Jacewicz
|
69540f6f78
|
Serialization-safe supplier added
|
2020-03-11 16:59:05 +01:00 |
Przemysław Jacewicz
|
e6e214dab5
|
Oaf merge and get strategy added
|
2020-03-11 16:58:17 +01:00 |
Claudio Atzori
|
7b6f0c8756
|
reading graph dump as text files, encoded as newline-delimited JSON records, as indicated in the wiki
|
2020-03-10 17:19:17 +01:00 |
Claudio Atzori
|
60aedb1110
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-10 17:09:44 +01:00 |
Claudio Atzori
|
a3f184fd3f
|
added field websiteurl in related organizations
|
2020-03-10 17:08:58 +01:00 |
Claudio Atzori
|
0e95544495
|
fixed serialization for datasource subjects
|
2020-03-10 17:07:44 +01:00 |
Sandro La Bruzzo
|
7b28783fb4
|
updated unpaywall mapping
|
2020-03-08 17:00:19 +01:00 |
Michele Artini
|
b6efa9d6ab
|
Configuration of the SequenceFile Writer
|
2020-03-05 15:49:14 +01:00 |
Claudio Atzori
|
5e342a555c
|
no need to compute the inverse relClass, fixed text() in xpath expressions
|
2020-03-05 12:51:48 +01:00 |
Claudio Atzori
|
6ec04d4e02
|
specified column used to perform the join operation in the javadoc
|
2020-03-05 12:50:38 +01:00 |
Michele Artini
|
7a2a466161
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-04 14:50:59 +01:00 |
Michele Artini
|
755eade2fb
|
fix creation ids
|
2020-03-04 14:49:45 +01:00 |
Claudio Atzori
|
6379f32466
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-04 10:57:06 +01:00 |
Claudio Atzori
|
0233987603
|
introduced post processing step following the hive DB creation/population
|
2020-03-04 10:56:50 +01:00 |
Claudio Atzori
|
1e563bc15e
|
introduced distinct properties driving the resouce usage for the XML record creation and the indexing phase
|
2020-03-04 10:55:11 +01:00 |
Claudio Atzori
|
9af3e904be
|
close the SparkSession at the end
|
2020-03-04 10:53:31 +01:00 |
Michele Artini
|
e7167b996a
|
logs and closeable
|
2020-03-04 10:46:36 +01:00 |
Claudio Atzori
|
25ceec29ab
|
code formatting
|
2020-03-04 10:44:24 +01:00 |
Claudio Atzori
|
63c00c5e88
|
fixed typo
|
2020-03-04 10:43:44 +01:00 |
Miriam Baglioni
|
c37f2bd1b5
|
moved some classes to package to make code clearer
|
2020-03-03 16:42:23 +01:00 |
Miriam Baglioni
|
d9d2060561
|
implementation for bulk tagging
|
2020-03-03 16:38:50 +01:00 |
Miriam Baglioni
|
e80f80ca93
|
properties and workflow for new propagation
|
2020-03-02 17:03:31 +01:00 |
Claudio Atzori
|
9cf5ce2e66
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-02 17:03:10 +01:00 |
Claudio Atzori
|
bc7cfd5975
|
indexing workflow WIP: fixed projects fundingtree xml conversion, prioritized links between results and projects when limiting them to 100 in the join procedure
|
2020-03-02 17:03:07 +01:00 |
Miriam Baglioni
|
50080c1b3c
|
changed the implementation of addAll method. Before adding all the items in a collection, we check if the accumulator set is not empty
|
2020-03-02 16:41:37 +01:00 |
Miriam Baglioni
|
02815dd2cf
|
update result for community moved in propagationconstants
|
2020-03-02 16:40:56 +01:00 |
Miriam Baglioni
|
95f8c3092f
|
update for new propagation implementation and moving of updateResult for community business logic since the same can be used for result to community from organization and result to community from semrel
|
2020-03-02 16:40:17 +01:00 |
Miriam Baglioni
|
3d63f35dcb
|
implementation of new propagation. Result to community for results linked to given organization. We exploit the hasAuthorInstitution semantic link to discover which results are related to institutions
|
2020-03-02 16:39:03 +01:00 |
Michele Artini
|
4b29a121b0
|
migration using spark in step2
|
2020-03-02 16:12:14 +01:00 |
Michele Artini
|
5445a57102
|
migration using spark in step2
|
2020-03-02 16:11:59 +01:00 |
Miriam Baglioni
|
3a4ccb26c0
|
New properties for the orcid to result propagation through semantic relation
|
2020-02-28 18:26:04 +01:00 |
Miriam Baglioni
|
b50166b9ad
|
None
|
2020-02-28 18:25:28 +01:00 |
Miriam Baglioni
|
550cb21c23
|
None
|
2020-02-28 18:24:39 +01:00 |
Miriam Baglioni
|
b098ee0bae
|
Changed the structure of typed row to conatain also list of authors with orcid
|
2020-02-28 18:23:51 +01:00 |
Miriam Baglioni
|
841f5523fe
|
Added information and methods for the new propagation of orcid to result through semrel
|
2020-02-28 18:23:16 +01:00 |
Miriam Baglioni
|
2b7b05fb29
|
New propagation of ORCID to result exploiting the semantic relation connecting them. R has author with orcid o, R is bounf by strong semantic relationship with R1 that has the same author withouth orcid, then o is also associated to the author in R1
|
2020-02-28 18:22:41 +01:00 |
Miriam Baglioni
|
833c83c694
|
Wrong file name
|
2020-02-28 18:21:01 +01:00 |
Miriam Baglioni
|
a86426776a
|
Changed from Oaf to Result the type of the updateResult method parameter, not to be forced to cast each time
|
2020-02-28 18:20:19 +01:00 |
Sandro La Bruzzo
|
b32655e48e
|
changed code to save intermediate result
|
2020-02-27 10:18:46 +01:00 |
Claudio Atzori
|
60bc2b1a20
|
drop the hive DB before populating it from scratch
|
2020-02-27 10:10:55 +01:00 |
Sandro La Bruzzo
|
f09e065865
|
incremented number of repartition
|
2020-02-26 19:26:19 +01:00 |
Sandro La Bruzzo
|
071f5c3e52
|
fixed NPE
|
2020-02-26 15:42:20 +01:00 |
Sandro La Bruzzo
|
a1a6fc8315
|
fixed NPE
|
2020-02-26 15:42:13 +01:00 |
Sandro La Bruzzo
|
1edf02a3ce
|
added log
|
2020-02-26 15:25:03 +01:00 |
Sandro La Bruzzo
|
c3ecabd8e8
|
fixed NPE
|
2020-02-26 14:40:02 +01:00 |
Sandro La Bruzzo
|
5d0f46651b
|
fixed NPE
|
2020-02-26 14:31:34 +01:00 |
Sandro La Bruzzo
|
bc342bf73a
|
fixed wrong generation type in summary
|
2020-02-26 12:49:47 +01:00 |
Sandro La Bruzzo
|
3112e21858
|
fixed typo
|
2020-02-26 12:22:43 +01:00 |
Sandro La Bruzzo
|
119ae6eef5
|
fixed wrong loop in the workflow
|
2020-02-26 12:18:50 +01:00 |
Sandro La Bruzzo
|
7936583a3d
|
added generation of Scholix collection
|
2020-02-26 12:09:06 +01:00 |
Przemysław Jacewicz
|
02db368dc5
|
Merge branch 'master' into przemyslawjacewicz_actionmanager_impl_prototype
|
2020-02-26 11:50:20 +01:00 |
Sandro La Bruzzo
|
2ef3705b2c
|
Added Provision workflow
|
2020-02-26 10:51:35 +01:00 |
Michele Artini
|
689908b2e9
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-02-25 16:00:51 +01:00 |
Michele Artini
|
93665773ea
|
Fixed a problem with JavaRDD Union
|
2020-02-25 15:59:21 +01:00 |
Sandro La Bruzzo
|
b021b8a2e1
|
Added index wf
|
2020-02-24 10:15:55 +01:00 |
Claudio Atzori
|
6a73fd5da5
|
in order to reuse the same XmlRecordFactory across different tasks, the state of contexts must be one per record built
|
2020-02-21 09:17:19 +01:00 |
Michele Artini
|
d49cd2fdc6
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-02-20 11:21:54 +01:00 |
Miriam Baglioni
|
3f941a2af4
|
Merge branch 'master' into propagationCommunityToResult
|
2020-02-19 18:05:22 +01:00 |
Miriam Baglioni
|
b2bdc9b99b
|
merging project to result propagation logic to master
|
2020-02-19 18:04:59 +01:00 |
Miriam Baglioni
|
a153a07997
|
none
|
2020-02-19 18:03:13 +01:00 |
Miriam Baglioni
|
d0279af630
|
start to implement the business logic
|
2020-02-19 17:59:24 +01:00 |
Miriam Baglioni
|
5f63ab1416
|
to query the information system to get the list of comunities up to now. It will have a more general usage when introducing bulk tagging
|
2020-02-19 17:59:02 +01:00 |
Miriam Baglioni
|
5ceb174d24
|
Merge branch 'master' into propagationCommunityToResult
|
2020-02-19 17:13:38 +01:00 |
Miriam Baglioni
|
e8af7a6b64
|
Merge remote-tracking branch 'upstream/master'
|
2020-02-19 17:03:10 +01:00 |
Miriam Baglioni
|
79ff79b0cd
|
propagation of result to community through semantic relation: C -> R and R -> isSupplementedBy R1 => C -> R1
|
2020-02-19 17:02:39 +01:00 |
Claudio Atzori
|
5e5e32cb48
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-02-19 16:56:52 +01:00 |
Claudio Atzori
|
33185fd0b7
|
ISLookupClientFactory moved in dhp-common
|
2020-02-19 16:56:38 +01:00 |
Michele Artini
|
5d3739b5cf
|
migration of claims
|
2020-02-19 15:11:17 +01:00 |
Miriam Baglioni
|
ab84163bb3
|
added set accumulator in TypedRow and used it to acucmulate country information in Country Propagation
|
2020-02-19 15:02:50 +01:00 |
Miriam Baglioni
|
bb0fdf5e0a
|
fix wrong source target in new relation
|
2020-02-19 15:00:46 +01:00 |
Miriam Baglioni
|
9e1678ccf8
|
fix error in workflow name
|
2020-02-19 14:59:24 +01:00 |
Miriam Baglioni
|
8aa3b4d7c0
|
adding to propagation constants the ones needed for propagation of project to result and addition of new accumulator Set in typed row to collect values of a type
|
2020-02-19 14:55:54 +01:00 |
Miriam Baglioni
|
7167673a58
|
implementation and configuration for propagation of project to result through semantic relation: P -> R1 and R1 -> supplemented by -> R2 => P -> R2
|
2020-02-19 14:54:18 +01:00 |
Michele Artini
|
173f1df1e5
|
saved a query for openaire production database
|
2020-02-19 10:15:08 +01:00 |
Sandro La Bruzzo
|
9a2d74ac82
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-02-19 10:13:45 +01:00 |
Sandro La Bruzzo
|
e5d7cdf422
|
fixed sql query
|
2020-02-19 10:13:36 +01:00 |
Sandro La Bruzzo
|
2b8675462f
|
refactoring code
|
2020-02-19 10:07:08 +01:00 |
Miriam Baglioni
|
b81e6af429
|
added config for new propagation
|
2020-02-18 17:30:44 +01:00 |
Miriam Baglioni
|
b736a9581c
|
changed relclass and reltype in reelation specification for country propagation and implementation of propagation of result affiliation through institutional repositories
|
2020-02-18 17:27:28 +01:00 |
Miriam Baglioni
|
ed262293a6
|
aligned to new snapshot version 1.1.6
|
2020-02-18 17:25:32 +01:00 |
Miriam Baglioni
|
2688a89c21
|
changed relclass and reltype in relation specification
|
2020-02-18 17:24:40 +01:00 |
Miriam Baglioni
|
c0022fec9f
|
moved on upper package to serve other propagations
|
2020-02-18 17:24:11 +01:00 |
Miriam Baglioni
|
e0a777028a
|
fix problem in parameters
|
2020-02-18 17:23:34 +01:00 |
Claudio Atzori
|
ed76521d9b
|
removed stale test resources, will be re-added later on
|
2020-02-18 11:51:08 +01:00 |
Claudio Atzori
|
0f364605ff
|
removed stale tests, need to reimplemente them anyway
|
2020-02-18 11:48:19 +01:00 |
Miriam Baglioni
|
5868ff8a86
|
synch fork with master
|
2020-02-17 18:22:27 +01:00 |
Przemysław Jacewicz
|
958f0693d6
|
WIP: logic for promoting action sets added
|
2020-02-17 18:19:19 +01:00 |
Miriam Baglioni
|
18e4092d5c
|
change name of properties dir
|
2020-02-17 18:07:06 +01:00 |
Miriam Baglioni
|
bd0e504b42
|
changes to the wf configuration
|
2020-02-17 18:04:15 +01:00 |
Miriam Baglioni
|
3a9d723655
|
adding default parameters in code
|
2020-02-17 16:30:52 +01:00 |
Przemysław Jacewicz
|
bea1a94346
|
Merge branch 'master' into przemyslawjacewicz_actionmanager_impl_prototype
# Conflicts:
# dhp-workflows/pom.xml
|
2020-02-17 15:07:23 +01:00 |
Claudio Atzori
|
6a288625e5
|
fixed workflow outgoing node
|
2020-02-17 15:04:33 +01:00 |
Miriam Baglioni
|
a5517eee35
|
adding the mkdirs for creation of propagation folder under provision on tmp
|
2020-02-17 14:20:42 +01:00 |
Miriam Baglioni
|
9abde5cfac
|
removed outputPath from job parameters
|
2020-02-17 14:19:53 +01:00 |
Claudio Atzori
|
1b18fd4d54
|
sync with master branch
|
2020-02-17 13:49:46 +01:00 |
Sandro La Bruzzo
|
4f04759738
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-02-17 12:31:58 +01:00 |
Sandro La Bruzzo
|
76ee85141a
|
added oozie job for DNET migration and implemented Spark job for extracting entities
|
2020-02-17 12:31:44 +01:00 |
Miriam Baglioni
|
be2421d5d8
|
removed wrongly pushed file
|
2020-02-17 12:07:26 +01:00 |
Claudio Atzori
|
c460e2d281
|
Aggiornare 'dhp-workflows/docs/oozie-installer.markdown'
|
2020-02-17 11:54:48 +01:00 |
Miriam Baglioni
|
c7bc73aedf
|
country propagation for results collected from institutional repositories
|
2020-02-17 11:44:48 +01:00 |
Michele Artini
|
176c5606bd
|
aligned with origin/master, aligned model and mapping
|
2020-02-17 10:40:53 +01:00 |
Claudio Atzori
|
56d1810a66
|
working procedure for records indexing using Spark, via lib com.lucidworks.spark:spark-solr
|
2020-02-14 12:28:52 +01:00 |
Claudio Atzori
|
1ee1baa8c0
|
Merge branch 'master' into provision_indexing
|
2020-02-13 18:17:07 +01:00 |
Claudio Atzori
|
a3d0b57b25
|
[maven-release-plugin] prepare for next development iteration
|
2020-02-13 18:11:33 +01:00 |
Claudio Atzori
|
6ed9a15bc8
|
[maven-release-plugin] prepare release dhp-1.1.5
|
2020-02-13 18:11:31 +01:00 |
Claudio Atzori
|
49e648f7c3
|
bumped version
|
2020-02-13 18:09:31 +01:00 |
Claudio Atzori
|
f9fae97e09
|
test json files aligned with the latest model changes
|
2020-02-13 18:05:59 +01:00 |
Claudio Atzori
|
1fee6e2b7e
|
implemented XML records construction and serialization, indexing WIP
|
2020-02-13 16:53:27 +01:00 |