Miriam Baglioni
|
354f0162be
|
changes in the blacklist and workflow definition
|
2020-04-30 10:26:50 +02:00 |
Claudio Atzori
|
439c6255a2
|
cleanup
|
2020-04-29 19:09:07 +02:00 |
Claudio Atzori
|
77ac995770
|
cleaned up poms, added descriptions
|
2020-04-29 18:44:17 +02:00 |
Miriam Baglioni
|
3cffee74b9
|
merge with upstream
|
2020-04-29 18:25:29 +02:00 |
Miriam Baglioni
|
9ab46535e7
|
pom with the new blacklist module added
|
2020-04-29 18:17:15 +02:00 |
Miriam Baglioni
|
6a47e6191d
|
read from blacklist and write the result as relations on hdfs
|
2020-04-29 18:16:01 +02:00 |
Miriam Baglioni
|
869f576273
|
added hash map for relationship entityType id prefix, and relation inverse
|
2020-04-29 18:14:52 +02:00 |
Miriam Baglioni
|
b85ad7012a
|
reads the blacklist from the blacklist db and writes it as a set of relations on hdfs
|
2020-04-29 17:29:49 +02:00 |
Claudio Atzori
|
8fd81e863d
|
added default value for the external_stats_db_name
|
2020-04-29 15:36:24 +02:00 |
Claudio Atzori
|
c6f3ff4462
|
stats workflow content relocated into common package; added <global> property definitions in stats workflow.xml
|
2020-04-29 14:29:27 +02:00 |
miconis
|
e0d14fe4f8
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-29 13:02:53 +02:00 |
miconis
|
0352d3b0ba
|
entity dumps in dedup compressed
|
2020-04-29 13:02:34 +02:00 |
Michele Artini
|
c43b4c8962
|
formatting
|
2020-04-29 12:56:58 +02:00 |
Michele Artini
|
a5d7007005
|
Fix relations in migration
Fix pom.xml in dhp-stats-update
|
2020-04-29 12:05:41 +02:00 |
Miriam Baglioni
|
f7695e833c
|
resolved conflicts
|
2020-04-29 11:41:31 +02:00 |
Claudio Atzori
|
3616d0f88d
|
Merge pull request 'Adding the stats workflow to the dnet-hadoop hierarchy' (#6) from spyros/dnet-hadoop:master into master
Integrating stats update workflow.
|
2020-04-29 10:35:02 +02:00 |
Claudio Atzori
|
964972d29a
|
added data provision workflow definition WIP
|
2020-04-29 09:25:50 +02:00 |
miconis
|
62e467eb0c
|
assertion numbers updated to fit the new implementation of the pace-core
|
2020-04-28 11:46:23 +02:00 |
Claudio Atzori
|
6f5b899038
|
reformatted code according to the updated style descriptor
|
2020-04-28 11:23:29 +02:00 |
Claudio Atzori
|
ac25f2d8d1
|
integrated changes from master
|
2020-04-28 08:55:28 +02:00 |
Miriam Baglioni
|
2980e50edf
|
merge upstream
|
2020-04-27 15:06:48 +02:00 |
Claudio Atzori
|
a0bdbacdae
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:52:31 +02:00 |
Claudio Atzori
|
7a3f8085f7
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:45:40 +02:00 |
Michele Artini
|
1260d03eba
|
skip empty projects
|
2020-04-27 13:51:13 +02:00 |
Miriam Baglioni
|
df34a4ebcc
|
changed the configuration to add ignorecase option to each verb related to covid-19 community
|
2020-04-27 12:32:56 +02:00 |
Miriam Baglioni
|
7a59324ccf
|
changed the test to check for the new ignorecase option
|
2020-04-27 12:31:46 +02:00 |
Miriam Baglioni
|
986c97348d
|
added the ignorecase option to each selection verb
|
2020-04-27 12:31:05 +02:00 |
Miriam Baglioni
|
a303fc9f73
|
resources for testing propagation of result to comminuty from organization and from semrel
|
2020-04-27 11:14:16 +02:00 |
Miriam Baglioni
|
c093d764a3
|
-
|
2020-04-27 11:12:38 +02:00 |
Miriam Baglioni
|
c925e2be16
|
test for propagation of result to community from organization and result to community from semrel
|
2020-04-27 10:59:53 +02:00 |
Miriam Baglioni
|
ec7f166690
|
changed the bl because of changed of the examples for the re implementation of the propagation step
|
2020-04-27 10:58:41 +02:00 |
Miriam Baglioni
|
6135096ef1
|
refactoring
|
2020-04-27 10:57:50 +02:00 |
Miriam Baglioni
|
d30e710165
|
fixed duplicates action name in the workflow
|
2020-04-27 10:52:30 +02:00 |
Miriam Baglioni
|
f9ee343fc0
|
new parametrized workflow with preparation steps and new parameter input files
|
2020-04-27 10:48:31 +02:00 |
Miriam Baglioni
|
e2093644dc
|
changed in the workflow the directory where to store the preparedInfo and the graph genearated at this step
|
2020-04-27 10:46:44 +02:00 |
Miriam Baglioni
|
8a58bf2744
|
removed the writeUpdate option
|
2020-04-27 10:45:06 +02:00 |
Miriam Baglioni
|
5dccbe13db
|
merge with upstream
|
2020-04-27 10:43:59 +02:00 |
Miriam Baglioni
|
7b6505ec69
|
new resuorces for testing propagation of project to result after the re-implementation
|
2020-04-27 10:42:16 +02:00 |
Miriam Baglioni
|
1b0e0bd1b5
|
refactoring
|
2020-04-27 10:40:26 +02:00 |
Miriam Baglioni
|
e5a177f0a7
|
refactoring
|
2020-04-27 10:36:21 +02:00 |
Miriam Baglioni
|
e000754c92
|
refactoring
|
2020-04-27 10:34:03 +02:00 |
Miriam Baglioni
|
95a54d5460
|
removed the writeUpdate option. The update is available in the preparedInfo path
|
2020-04-27 10:30:32 +02:00 |
Miriam Baglioni
|
8802e4126b
|
re-implemented inverting the couple: from (projectId, relatedResultList) to (resultId, relatedProjectList)
|
2020-04-27 10:26:55 +02:00 |
Claudio Atzori
|
268462623a
|
refined definition of equals and hash methods for Oaf model classes, now based on entity identifier, while relations consider sourceid, targetid and relationship semantic; Factored out function to group Oaf objects in grouping operations; Raw graph creation procedure merges entities and relationships providing the same identity
|
2020-04-24 14:42:01 +02:00 |
Claudio Atzori
|
a3e480d1c9
|
implmented DispatchEntitiesApplication using spark2 datasets
|
2020-04-24 14:36:53 +02:00 |
Claudio Atzori
|
48157e0fc4
|
GraphHiveImporterJob moved in dedicate package
|
2020-04-24 14:32:28 +02:00 |
Miriam Baglioni
|
adcbf0e29a
|
refactoring
|
2020-04-24 10:47:43 +02:00 |
Claudio Atzori
|
278fc9d276
|
code formatting
|
2020-04-23 18:51:38 +02:00 |
miconis
|
5414236644
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-23 18:17:23 +02:00 |
miconis
|
8d258c85ff
|
spark dedup test fixed, sample for dataset and orp added, test implemented
|
2020-04-23 18:16:20 +02:00 |
Michele Artini
|
072eae3803
|
fixed a problem with missing contexts
|
2020-04-23 16:42:49 +02:00 |
Michele Artini
|
b164d96874
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-23 16:19:16 +02:00 |
Michele Artini
|
d920ce501e
|
fixed a problem with missing instances
|
2020-04-23 16:18:40 +02:00 |
Miriam Baglioni
|
0e447add66
|
removed unuseful classes
|
2020-04-23 12:59:43 +02:00 |
Miriam Baglioni
|
edb00db86a
|
refactoring
|
2020-04-23 12:57:35 +02:00 |
Miriam Baglioni
|
44fab140de
|
-
|
2020-04-23 12:42:07 +02:00 |
Miriam Baglioni
|
769aa8178a
|
refactoring
|
2020-04-23 12:40:44 +02:00 |
Miriam Baglioni
|
d8dc31d4af
|
refactoring
|
2020-04-23 12:35:49 +02:00 |
Miriam Baglioni
|
8c5dac5cc3
|
removed unuseful classes
|
2020-04-23 12:30:58 +02:00 |
Miriam Baglioni
|
15656684b9
|
added proeprties for the preparation step and actual propagation. Added the new parametrized workflow
|
2020-04-23 12:13:34 +02:00 |
Miriam Baglioni
|
6f35f5ca42
|
added the steps of reset output dir and copy information not changed by the propagation step
|
2020-04-23 12:12:07 +02:00 |
Miriam Baglioni
|
19cd5b85c0
|
changed the classname to execute
|
2020-04-23 12:07:41 +02:00 |
Miriam Baglioni
|
fa2ff5c6f5
|
refactoring
|
2020-04-23 11:58:26 +02:00 |
Miriam Baglioni
|
540f70298b
|
added missing property
|
2020-04-23 11:51:48 +02:00 |
Miriam Baglioni
|
e431fe4f5b
|
added the implements Serializable to each class
|
2020-04-23 11:48:47 +02:00 |
Miriam Baglioni
|
24fa81d7e8
|
implementation parametrized for result type
|
2020-04-23 11:44:19 +02:00 |
Miriam Baglioni
|
ab2a24cc2b
|
changed the dependency to use reflections to find annotated classes
|
2020-04-23 11:08:47 +02:00 |
Miriam Baglioni
|
5153d88bd3
|
defiition of workflow and properties for bulktagging
|
2020-04-23 11:04:53 +02:00 |
Miriam Baglioni
|
3b2e4ab670
|
test for bulktag
|
2020-04-23 10:00:10 +02:00 |
Claudio Atzori
|
8851050814
|
replaced hive_db_name with hiveDbName
|
2020-04-23 08:36:40 +02:00 |
Claudio Atzori
|
91f81107b1
|
applying code formatting
|
2020-04-23 07:52:32 +02:00 |
Claudio Atzori
|
1e7583c5a6
|
filtered invisible records in data provision workflow
|
2020-04-23 07:51:34 +02:00 |
Claudio Atzori
|
9ddafd46ca
|
fixed dedup record id prefix, set the correct dataInfo in the DedupRecordFactory
|
2020-04-23 07:50:18 +02:00 |
Claudio Atzori
|
ade4cb97af
|
fixed parameters passed to the postprocessing action in the workflow mapping the graph as hive DB
|
2020-04-22 18:24:06 +02:00 |
Claudio Atzori
|
e81960335c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-22 10:46:37 +02:00 |
Michele Artini
|
9e4d58f505
|
ResultType
|
2020-04-22 10:07:26 +02:00 |
Claudio Atzori
|
c891661822
|
small adjustments in the graph2hive workflow
|
2020-04-21 18:52:23 +02:00 |
Miriam Baglioni
|
259525cb93
|
Merge remote-tracking branch 'upstream/master'
|
2020-04-21 18:33:46 +02:00 |
Miriam Baglioni
|
30e53261d0
|
minor
|
2020-04-21 18:00:53 +02:00 |
Claudio Atzori
|
0b55795d4d
|
small adjustments in the provisioning workflow
|
2020-04-21 16:15:04 +02:00 |
Claudio Atzori
|
88fbb3a353
|
added sparkSqlWarehouseDir to the default extra spark options passed to each workflow
|
2020-04-21 16:13:43 +02:00 |
Claudio Atzori
|
cd320efa96
|
added extra spark options to graph to hive workflow
|
2020-04-21 16:12:20 +02:00 |
Miriam Baglioni
|
90c768dde6
|
added shaded libs module
|
2020-04-21 16:03:51 +02:00 |
Claudio Atzori
|
91e72a6944
|
Dataset based implementation for SparkCreateDedupRecord phase, fixed datasource entity dump supplementing dedup unit tests
|
2020-04-21 12:06:08 +02:00 |
miconis
|
5c9ef08a8e
|
spark dedup test fixed
|
2020-04-21 10:19:04 +02:00 |
Claudio Atzori
|
d772d967aa
|
restored changes from master branch
|
2020-04-20 18:53:06 +02:00 |
Claudio Atzori
|
eb8a020859
|
fixed behaviour of DedupRecordFactory
|
2020-04-20 18:44:06 +02:00 |
Claudio Atzori
|
ede1af3d85
|
Merge branch 'master' into deduptesting
|
2020-04-20 16:52:14 +02:00 |
miconis
|
1102e32462
|
SparkDedupTest updated and organization dump fixed
|
2020-04-20 16:49:01 +02:00 |
Claudio Atzori
|
667d23c58b
|
finalising Actionset migration workflow
|
2020-04-20 16:45:21 +02:00 |
miconis
|
4da13e4570
|
Revert "Merge branch 'master' into deduptesting"
This reverts commit 772f75d167 , reversing
changes made to 5f45f2c77f .
|
2020-04-20 16:04:49 +02:00 |
Claudio Atzori
|
9147af7fed
|
actionsets migration workflow moved in dhp-workflows/dhp-actionmanager
|
2020-04-20 15:24:33 +02:00 |
miconis
|
772f75d167
|
Merge branch 'master' into deduptesting
|
2020-04-20 14:50:12 +02:00 |
Claudio Atzori
|
d714bfb4d4
|
collectedfrom field moved in common parent class Oaf.java
|
2020-04-20 12:25:19 +02:00 |
Michele Artini
|
8ff7facfa3
|
fixed collectedFrom ID
|
2020-04-20 11:09:27 +02:00 |
Michele Artini
|
25307965d2
|
add a default datainfo if missing
|
2020-04-20 09:43:27 +02:00 |
Michele Artini
|
d2058fdc47
|
tests
|
2020-04-20 09:31:14 +02:00 |
Michele Artini
|
478a958f09
|
tests
|
2020-04-20 09:15:27 +02:00 |
Miriam Baglioni
|
e1848b7603
|
minor
|
2020-04-18 14:16:42 +02:00 |
Miriam Baglioni
|
0ff9b1ef05
|
added needed parameter
|
2020-04-18 14:16:29 +02:00 |
Miriam Baglioni
|
e2dfe8b656
|
removed not used action
|
2020-04-18 14:16:07 +02:00 |
Miriam Baglioni
|
437ebbad76
|
refactorign
|
2020-04-18 14:15:09 +02:00 |
Miriam Baglioni
|
9a8876ac86
|
added needed parameter
|
2020-04-18 14:14:08 +02:00 |
Miriam Baglioni
|
9854852878
|
refactoring
|
2020-04-18 14:13:16 +02:00 |
Miriam Baglioni
|
454b8a6a29
|
Merge remote-tracking branch 'upstream/master'
|
2020-04-18 14:09:44 +02:00 |
Miriam Baglioni
|
890ec28f0f
|
input parameters for preparation step1
|
2020-04-18 14:09:37 +02:00 |
Miriam Baglioni
|
fbf5c27c27
|
Added preparation classes before actual propagation
|
2020-04-18 14:09:03 +02:00 |
Claudio Atzori
|
5f45f2c77f
|
Merge branch 'master' into deduptesting
|
2020-04-18 12:46:40 +02:00 |
Claudio Atzori
|
ad7a131b18
|
introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project
|
2020-04-18 12:42:58 +02:00 |
Claudio Atzori
|
a2938dd059
|
cleanup
|
2020-04-18 12:24:22 +02:00 |
Claudio Atzori
|
9374ff03ea
|
Merge branch 'master' into deduptesting
|
2020-04-18 12:06:58 +02:00 |
Claudio Atzori
|
71813795f6
|
various refactorings on the dnet-dedup-openaire workflow
|
2020-04-18 12:06:23 +02:00 |
miconis
|
6450bb0daa
|
test for softwares dedup added. definition of orp, dataset and sw dedup configurations
|
2020-04-17 17:31:59 +02:00 |
Miriam Baglioni
|
72c63a326e
|
removed unuseful class
|
2020-04-17 17:14:51 +02:00 |
Miriam Baglioni
|
00c2ca3ee5
|
-
|
2020-04-17 17:14:25 +02:00 |
Miriam Baglioni
|
5cd092114f
|
use mergeFrom method to add the new community contexts
|
2020-04-17 17:13:18 +02:00 |
Miriam Baglioni
|
264c82f21e
|
minor
|
2020-04-17 16:54:46 +02:00 |
Miriam Baglioni
|
8c079c7a49
|
unit test for orcid to result propagation from semrel
|
2020-04-17 16:53:03 +02:00 |
Miriam Baglioni
|
eacd140a98
|
added missing parameter(s)
|
2020-04-17 16:52:30 +02:00 |
Miriam Baglioni
|
390e250faf
|
use the addPid method of the Author class to add a new pid
|
2020-04-17 16:52:02 +02:00 |
Miriam Baglioni
|
b46b080ddc
|
use mergeFrom method call to add the country(ies) instead of modify the result directly.
|
2020-04-17 16:50:54 +02:00 |
Miriam Baglioni
|
c4987dd12a
|
minor
|
2020-04-17 16:49:08 +02:00 |
Claudio Atzori
|
038ac7afd7
|
relation consistency workflow separated from dedup scan and creation of CCs
|
2020-04-17 13:12:44 +02:00 |
Claudio Atzori
|
c92bfeeaee
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-17 13:07:52 +02:00 |
Miriam Baglioni
|
adc11c97a7
|
Merge remote-tracking branch 'upstream/master'
|
2020-04-17 12:34:31 +02:00 |
Sandro La Bruzzo
|
01ea7721f3
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-17 12:12:25 +02:00 |
Sandro La Bruzzo
|
5e2fa996aa
|
fixed problem with conversion of long into string
|
2020-04-17 12:11:51 +02:00 |
miconis
|
418cf94642
|
implementation of the deletedbyinference test in propagating relations
|
2020-04-17 10:40:21 +02:00 |
Miriam Baglioni
|
5d772e5263
|
new implementation of propagation of community to result from organization that exploits the prepared info
|
2020-04-16 18:45:22 +02:00 |
Miriam Baglioni
|
fff1e5ec39
|
classes to (de)serialize the data provided in the preparation step
|
2020-04-16 18:44:43 +02:00 |
Miriam Baglioni
|
3fd9d6b02f
|
preparation phase for the propagation of community to result from organization
|
2020-04-16 18:43:55 +02:00 |
Miriam Baglioni
|
a9120164aa
|
added hive parameter and a step of reset of the working dir in the workflow
|
2020-04-16 18:42:04 +02:00 |
Miriam Baglioni
|
6afbd542ca
|
changed the save mode to avoid NegativeArraySize... error. Needed to modify also the preparationstep2
|
2020-04-16 18:40:14 +02:00 |
Miriam Baglioni
|
d60fd36046
|
changed the save method
|
2020-04-16 16:14:15 +02:00 |
Miriam Baglioni
|
951b13ac46
|
input parameters and workflow for new implementation of propagation of orcid to result from semrel and preparation phases
|
2020-04-16 16:13:10 +02:00 |
Miriam Baglioni
|
4d89f3dfed
|
removed unuseful classes
|
2020-04-16 16:11:44 +02:00 |
Miriam Baglioni
|
5e72a51f11
|
-
|
2020-04-16 16:11:20 +02:00 |
Miriam Baglioni
|
c33a593381
|
renamed
|
2020-04-16 16:09:47 +02:00 |
Miriam Baglioni
|
0e5399bf74
|
seconf phase of data preparation. Groups all the possible updates by id
|
2020-04-16 16:08:51 +02:00 |
Miriam Baglioni
|
548ba915ac
|
first phase of data preparation. For each result type (parallel) it produces the possible updates
|
2020-04-16 15:58:42 +02:00 |
Miriam Baglioni
|
243013cea3
|
to (de)serialize the association from the resultId and the list of autoritative authors with orcid to possibly propagate
|
2020-04-16 15:57:29 +02:00 |
Miriam Baglioni
|
ac3ad25b36
|
to (de)serialize needed information of the author to determine if the orcid can be passed (name, surname, fullname (?), orcid)
|
2020-04-16 15:56:33 +02:00 |
Miriam Baglioni
|
d6cd700a32
|
new implementation that exploits prepared information (the list of possible updates: resultId - possible list of orcid to be added
|
2020-04-16 15:55:25 +02:00 |
Miriam Baglioni
|
f077f22f73
|
minor
|
2020-04-16 15:54:16 +02:00 |
Miriam Baglioni
|
fd5d792e35
|
refactoring
|
2020-04-16 15:53:34 +02:00 |
Claudio Atzori
|
cb0952428e
|
Merge branch 'master' into deduptesting
|
2020-04-16 14:42:25 +02:00 |
Claudio Atzori
|
cc21bbfb1a
|
Merge branch 'deduptesting' of https://code-repo.d4science.org/D-Net/dnet-hadoop into deduptesting
|
2020-04-16 14:41:37 +02:00 |
Claudio Atzori
|
ec5dfc068d
|
added spark.sql.shuffle.partitions=3840 to dedup scan wf
|
2020-04-16 14:41:28 +02:00 |
Claudio Atzori
|
09f356b047
|
Merge pull request 'Closes #7: subdirs inside graph table dirs' (#8) from przemyslaw.jacewicz/dnet-hadoop:przemyslawjacewicz_7_distcp_configuration_fix into master
Run the code from this PR in isolation and it worked fine. Thanks!
|
2020-04-16 14:38:46 +02:00 |
Claudio Atzori
|
3437383112
|
Merge branch 'master' into deduptesting
|
2020-04-16 12:46:14 +02:00 |