Claudio Atzori
|
f044d09315
|
revised mapping: more accurate mapping for name/surname from datacite format; improved mapping of null values
|
2020-05-14 15:07:24 +02:00 |
Miriam Baglioni
|
e7eb4f377e
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-14 10:34:17 +02:00 |
Miriam Baglioni
|
8828458acf
|
minor changes
|
2020-05-14 10:34:12 +02:00 |
Claudio Atzori
|
ab37953332
|
added global properties in wf definitions to avoid repeating name-node and job-tracker in the (many) distcp actions; reintroduced output directory removal at the beginning of each spark action
|
2020-05-14 10:25:41 +02:00 |
Claudio Atzori
|
12bfa6702e
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-05-13 17:01:17 +02:00 |
Claudio Atzori
|
5ecacad70a
|
fixed default resource typing in Oaf/Odf mapping
|
2020-05-13 17:01:11 +02:00 |
Enrico Ottonello
|
12756f9d41
|
multithread (4 threads) test to feed elastic search
|
2020-05-13 16:11:40 +02:00 |
Michele Artini
|
c0265213a0
|
partial implementation
|
2020-05-13 12:00:27 +02:00 |
Sandro La Bruzzo
|
a92ee0f41e
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-13 10:38:13 +02:00 |
Sandro La Bruzzo
|
d876f47d06
|
next step of MAG conversion implemented
|
2020-05-13 10:38:04 +02:00 |
Claudio Atzori
|
1ddd33de41
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-05-13 09:04:41 +02:00 |
Claudio Atzori
|
85f3c55992
|
fixed node names in blacklist workflow
|
2020-05-13 09:04:33 +02:00 |
Miriam Baglioni
|
43f127448d
|
changed the package name from dhp-propagation to dhp-enrichment for the preparation phase of funding propagation
|
2020-05-12 18:24:26 +02:00 |
Enrico Ottonello
|
08040cef80
|
spark action to analyze orcid lambda file
|
2020-05-12 16:57:43 +02:00 |
Claudio Atzori
|
ec0782e582
|
renamed jar containing the bulktagging and propagation workflows from dhp-[bulktagging|propagation] to dhp-enrichment; adjusted xml formatting
|
2020-05-12 15:49:28 +02:00 |
Miriam Baglioni
|
1547ca7e15
|
added blacklist step to the end of the provision wf
|
2020-05-12 12:17:27 +02:00 |
Miriam Baglioni
|
14979f299e
|
changed the configuration factory
|
2020-05-12 11:28:38 +02:00 |
Miriam Baglioni
|
f8aef6161a
|
minor modification
|
2020-05-12 11:28:07 +02:00 |
Miriam Baglioni
|
7387f3449a
|
changed the route to find the verb resolver classes
|
2020-05-12 11:27:38 +02:00 |
Miriam Baglioni
|
7687519f00
|
merged conflicts with upstream branch
|
2020-05-12 10:03:44 +02:00 |
Miriam Baglioni
|
8ffc050b8a
|
fixed problem in communityconfigurationfactory test
|
2020-05-12 10:01:09 +02:00 |
Claudio Atzori
|
527e8169a8
|
adjusted paths pointing to test configurations, cleanup
|
2020-05-11 18:17:05 +02:00 |
Claudio Atzori
|
f9a62ba63b
|
added wf nodes to copy entities to the output path
|
2020-05-11 18:16:39 +02:00 |
Miriam Baglioni
|
ad63effb4e
|
removed deletion of working dir
|
2020-05-11 17:48:22 +02:00 |
Claudio Atzori
|
c6b028f2af
|
code formatting
|
2020-05-11 17:38:08 +02:00 |
Claudio Atzori
|
6d0b11252e
|
bulktagging wfs moved into common dhp-enrichment module
|
2020-05-11 17:32:06 +02:00 |
Miriam Baglioni
|
50659011eb
|
refactoring
|
2020-05-11 16:14:26 +02:00 |
Miriam Baglioni
|
e883daf87e
|
added the outputPath parameter and the reset path to remove the outputath directory
|
2020-05-11 16:10:24 +02:00 |
Miriam Baglioni
|
5ab3424c77
|
removed unused dependencies
|
2020-05-11 16:09:37 +02:00 |
Miriam Baglioni
|
6a3b081263
|
added the last step of blacklisteing
|
2020-05-11 16:09:20 +02:00 |
Enrico Ottonello
|
3b1a68cbf5
|
elastic search feed test
|
2020-05-11 14:53:52 +02:00 |
Enrico Ottonello
|
f53e42bda7
|
merged
|
2020-05-11 14:49:28 +02:00 |
Enrico Ottonello
|
7990894454
|
different date format in lambda file parsing
|
2020-05-11 14:41:11 +02:00 |
Sandro La Bruzzo
|
0c6774e4da
|
updated pom version
|
2020-05-11 14:35:14 +02:00 |
Miriam Baglioni
|
bbc9b4f329
|
removed unused imports
|
2020-05-11 14:28:55 +02:00 |
Miriam Baglioni
|
757bae53ea
|
removed unusefule serialization points
|
2020-05-11 14:28:37 +02:00 |
Miriam Baglioni
|
b35d57a1ac
|
added resources for test
|
2020-05-11 14:15:30 +02:00 |
Miriam Baglioni
|
e563e65335
|
moved check from join to method
|
2020-05-11 14:11:44 +02:00 |
Sandro La Bruzzo
|
b90609848b
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-11 14:08:31 +02:00 |
Sandro La Bruzzo
|
4062eafbdb
|
merged from branch
|
2020-05-11 14:08:16 +02:00 |
Miriam Baglioni
|
f5d785e096
|
used the DbClient moved in dhp-common
|
2020-05-11 13:59:42 +02:00 |
Miriam Baglioni
|
112b2cb3c3
|
added the test class
|
2020-05-11 13:58:58 +02:00 |
Miriam Baglioni
|
9a7ae523c9
|
update to version 1.2.1-SNAPSHOT
|
2020-05-11 13:57:47 +02:00 |
Miriam Baglioni
|
2abb84877d
|
Merge branch 'master' into blacklist
|
2020-05-11 10:37:49 +02:00 |
Miriam Baglioni
|
b0f0b24263
|
update to version 1.2.1-SNAPSHOT
|
2020-05-11 10:37:31 +02:00 |
Miriam Baglioni
|
a7e91e23ba
|
update to versione 1.2.1-SNAPSHOT
|
2020-05-11 10:34:30 +02:00 |
Miriam Baglioni
|
bb59bdd60f
|
merge upstream
|
2020-05-11 10:33:17 +02:00 |
Miriam Baglioni
|
5e3548add6
|
-
|
2020-05-11 10:33:08 +02:00 |
Miriam Baglioni
|
dc8c8fa480
|
changed the version
|
2020-05-11 10:20:48 +02:00 |
Miriam Baglioni
|
871e079b45
|
merged with master
|
2020-05-11 10:20:00 +02:00 |
Claudio Atzori
|
60c40618d3
|
[maven-release-plugin] prepare for next development iteration
|
2020-05-11 10:17:14 +02:00 |
Claudio Atzori
|
c267d958d5
|
[maven-release-plugin] prepare release dhp-1.2.0
|
2020-05-11 10:17:10 +02:00 |
Miriam Baglioni
|
622ba87ec2
|
changed the version
|
2020-05-11 10:10:36 +02:00 |
Miriam Baglioni
|
391b2399cc
|
merge upstream
|
2020-05-11 10:08:51 +02:00 |
Claudio Atzori
|
42f1a2bf94
|
bumped project version to 1.2.0-SNAPSHOT
|
2020-05-11 10:05:57 +02:00 |
Sandro La Bruzzo
|
1412158a6f
|
merged from branch
|
2020-05-11 09:45:50 +02:00 |
Miriam Baglioni
|
32301451ec
|
merge upstream
|
2020-05-11 09:42:23 +02:00 |
Miriam Baglioni
|
7e66bc2527
|
fix a typo in the compression keyword and added some logging info in the spark job
|
2020-05-11 09:40:58 +02:00 |
Sandro La Bruzzo
|
1662f221f5
|
added test class
|
2020-05-11 09:39:11 +02:00 |
Sandro La Bruzzo
|
2b48a2c32c
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-05-11 09:38:36 +02:00 |
Sandro La Bruzzo
|
4cebca09d2
|
start implementing MAG mapping
|
2020-05-11 09:38:27 +02:00 |
Spyros Zoupanos
|
ae0f535c73
|
Fixing hardcoded reference to main openAIRE graph db
|
2020-05-09 22:34:48 +03:00 |
Claudio Atzori
|
fd519df616
|
new rels produced by dedup workflow must be unique
|
2020-05-08 19:00:38 +02:00 |
Claudio Atzori
|
0ccc864ad9
|
[maven-release-plugin] prepare for next development iteration
|
2020-05-08 17:01:31 +02:00 |
Claudio Atzori
|
6e47c724c6
|
[maven-release-plugin] prepare release dhp-1.1.7
|
2020-05-08 17:01:27 +02:00 |
Claudio Atzori
|
5b28bb4131
|
code formatting
|
2020-05-08 16:49:47 +02:00 |
Claudio Atzori
|
8fd1952f16
|
code formatting
|
2020-05-08 16:01:09 +02:00 |
miconis
|
3420998bb4
|
reltype set in mergerels
|
2020-05-08 15:43:30 +02:00 |
Enrico Ottonello
|
b9d126dd1f
|
formatting modified after commit
|
2020-05-08 14:54:37 +02:00 |
Enrico Ottonello
|
7e1c987370
|
Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost
|
2020-05-08 14:49:50 +02:00 |
Enrico Ottonello
|
9d812788e4
|
added job to download from orcid the records modified after a fixed date, the info are taken from last_modified.csv on hdfs
|
2020-05-08 14:49:39 +02:00 |
Miriam Baglioni
|
9a29ab7508
|
got back to the readPath we have before
|
2020-05-08 13:08:56 +02:00 |
Miriam Baglioni
|
28556507e7
|
-
|
2020-05-08 12:54:52 +02:00 |
Claudio Atzori
|
b2192fdcdc
|
simplified reset_outputpath nodes across the workflows, applied common xml formatting
|
2020-05-08 12:33:31 +02:00 |
Miriam Baglioni
|
4c94231cad
|
merge with master fork
|
2020-05-08 12:25:57 +02:00 |
Miriam Baglioni
|
9b4c0d4b3a
|
-
|
2020-05-08 11:51:45 +02:00 |
Miriam Baglioni
|
53952707b6
|
modified test because of new step of data preparation. It now expects to find ResultCountrySet serialization nstead of DatasourceCountry
|
2020-05-08 11:49:19 +02:00 |
Claudio Atzori
|
62ea19f1d3
|
introduced mapping for ExternalReferences, made urls defined within an instance unique
|
2020-05-08 09:43:26 +02:00 |
Claudio Atzori
|
8c67073a07
|
force speculative execution to false
|
2020-05-08 09:42:21 +02:00 |
Miriam Baglioni
|
d6b9de9f46
|
Merge branch 'master' of https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop
|
2020-05-07 18:22:59 +02:00 |
Miriam Baglioni
|
f95d288681
|
fixed swithch of parameters
|
2020-05-07 18:22:32 +02:00 |
Claudio Atzori
|
166aafd936
|
heavy cleanup
|
2020-05-07 18:22:26 +02:00 |
Michele Artini
|
ac0da5a7ee
|
Partial implementation of broker events
|
2020-05-07 12:31:26 +02:00 |
Miriam Baglioni
|
fb405275f7
|
merged with master
|
2020-05-07 11:48:21 +02:00 |
Miriam Baglioni
|
e124278934
|
-
|
2020-05-07 11:47:11 +02:00 |
Claudio Atzori
|
5111671e62
|
celanup
|
2020-05-07 11:47:00 +02:00 |
Miriam Baglioni
|
9f8855991c
|
changed Encorders.bean to Encoders.kryo
|
2020-05-07 11:44:35 +02:00 |
Miriam Baglioni
|
207b899d6d
|
merged with upstream
|
2020-05-07 11:43:53 +02:00 |
Claudio Atzori
|
5b3f8a0e90
|
using Encoders.bean instead of kryo
|
2020-05-07 11:41:41 +02:00 |
Miriam Baglioni
|
182225becb
|
Merge branch 'master' of https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop
|
2020-05-07 11:38:17 +02:00 |
Miriam Baglioni
|
5efae3acb9
|
new workflow for job3
|
2020-05-07 11:38:10 +02:00 |
Claudio Atzori
|
73243793b2
|
Dataset based implementation for SparkCountryPropagationJob3
|
2020-05-07 11:15:24 +02:00 |
Claudio Atzori
|
128c3bf1c8
|
restored Author bean with simple getter/setter, author pid addition moved into dedicated implementation SparkOrcidToResultFromSemRelJob3
|
2020-05-07 11:14:56 +02:00 |
Miriam Baglioni
|
b2fec32c87
|
new workflow for job3
|
2020-05-07 10:01:57 +02:00 |
Miriam Baglioni
|
29bc8c44b1
|
changes in the construction of new country set
|
2020-05-07 10:01:34 +02:00 |
Miriam Baglioni
|
55e825acd4
|
chenged the test according to changes in SparkCOuntryPropagationJob2
|
2020-05-07 10:01:00 +02:00 |
Miriam Baglioni
|
16193cf0ba
|
new workflow and parameter for country propagation
|
2020-05-07 09:59:58 +02:00 |
Miriam Baglioni
|
5a476c7a13
|
chenged the xquery for the cfhb table
|
2020-05-07 09:58:17 +02:00 |
Miriam Baglioni
|
42ad51577a
|
new implementation with one more serialization step
|
2020-05-07 09:57:49 +02:00 |
Claudio Atzori
|
17860d3ab6
|
general changes in the RAW graph mapping: missing collectedfrom/hostedby causes records to be skipped; factored out most of the constants in ModelConstants class (dhp-schemas)
|
2020-05-06 13:20:02 +02:00 |
Claudio Atzori
|
fdfecc9578
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-05-06 11:28:01 +02:00 |
Claudio Atzori
|
c79e2f5977
|
drop workingPath before starting the dedup workflow
|
2020-05-06 11:27:44 +02:00 |
Michele Artini
|
8f30a09d84
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-05 17:12:22 +02:00 |
Michele Artini
|
ccc609f909
|
new module for the production of broker events
|
2020-05-05 17:09:00 +02:00 |
Miriam Baglioni
|
dd2e698a72
|
added a sequentialization step on the spark job. Addedd new parameter
|
2020-05-05 17:03:43 +02:00 |
Claudio Atzori
|
0825321d0b
|
improved unit tests in dhp-aggregation
|
2020-05-05 12:39:04 +02:00 |
Miriam Baglioni
|
252b219dd5
|
chanced the name of some properties
|
2020-05-05 10:03:32 +02:00 |
Claudio Atzori
|
4a8487165c
|
using long param names in wf definition
|
2020-05-04 19:19:29 +02:00 |
Claudio Atzori
|
a2fc37df5f
|
adjusted parameters
|
2020-05-04 19:18:59 +02:00 |
Claudio Atzori
|
f1b7e14036
|
code formatting
|
2020-05-04 19:18:34 +02:00 |
Miriam Baglioni
|
78578c3ccf
|
fixed wrong trnasition name in workflow
|
2020-05-04 15:46:24 +02:00 |
Miriam Baglioni
|
cc7d9b6b19
|
merge upstream
|
2020-05-04 13:59:09 +02:00 |
Miriam Baglioni
|
3957c815b9
|
changed the name of some parameters
|
2020-05-04 13:58:52 +02:00 |
Miriam Baglioni
|
e218360f8a
|
changed code for the mode of DbClient and also removed the dependency to graph-mapper
|
2020-05-04 12:26:17 +02:00 |
Miriam Baglioni
|
31ea05297d
|
moved the DbClient to common and added needed dependency to pom
|
2020-05-04 12:22:28 +02:00 |
miconis
|
085cf173d7
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-04 12:08:20 +02:00 |
miconis
|
3df703f67d
|
mergerels added to propagate relations
|
2020-05-04 12:08:12 +02:00 |
Claudio Atzori
|
bac37b3973
|
fixed children expansion in XML records
|
2020-05-04 11:51:17 +02:00 |
Claudio Atzori
|
077ccd8743
|
stats wf properties cleanup
|
2020-05-04 11:41:46 +02:00 |
Miriam Baglioni
|
b7dd400e51
|
added check if author.pid exists or is null
|
2020-05-01 15:09:02 +02:00 |
Miriam Baglioni
|
dbf3ba051a
|
minor
|
2020-04-30 20:22:07 +02:00 |
Miriam Baglioni
|
43053a286d
|
workflow pom with added blacklist module
|
2020-04-30 18:30:21 +02:00 |
Miriam Baglioni
|
0631fe548a
|
pom.xml
|
2020-04-30 18:29:46 +02:00 |
Miriam Baglioni
|
38ecfd5785
|
the wf with all the three steps for blacklisting relations
|
2020-04-30 18:28:46 +02:00 |
Miriam Baglioni
|
95433e1087
|
parameters for the preparation phase and blacklist phase
|
2020-04-30 18:28:13 +02:00 |
Miriam Baglioni
|
1070790c19
|
minor
|
2020-04-30 18:26:58 +02:00 |
Miriam Baglioni
|
b9d56b3ced
|
applies the actual removal of the relations
|
2020-04-30 18:26:25 +02:00 |
Miriam Baglioni
|
d6d6ebeae5
|
preparation step: creates the subset of the merges relations
|
2020-04-30 18:25:33 +02:00 |
Miriam Baglioni
|
13f30664ea
|
minor
|
2020-04-30 15:23:49 +02:00 |
Miriam Baglioni
|
276b95b7b3
|
add create file instruction
|
2020-04-30 15:05:17 +02:00 |
Miriam Baglioni
|
65a5d67b8b
|
minor modifications
|
2020-04-30 14:45:27 +02:00 |
Miriam Baglioni
|
418595fec2
|
removed the saveGraph parameter
|
2020-04-30 14:45:00 +02:00 |
Miriam Baglioni
|
ce8b1d0bc3
|
new workflow definition to be inserted in the provision pipeline
|
2020-04-30 14:38:54 +02:00 |
Miriam Baglioni
|
4b0bd91012
|
-
|
2020-04-30 12:45:28 +02:00 |
Miriam Baglioni
|
2349bfd8b8
|
changed the job test to remove the writeUpdate option
|
2020-04-30 11:43:33 +02:00 |
Sandro La Bruzzo
|
1e06bbaee8
|
fixed test
|
2020-04-30 11:38:58 +02:00 |
Miriam Baglioni
|
951517f9ec
|
new input parameters and workflow definition to be used in the provision pipeline
|
2020-04-30 11:32:50 +02:00 |
Miriam Baglioni
|
026f297e49
|
removed the writeUpdate oprion
|
2020-04-30 11:31:59 +02:00 |
Sandro La Bruzzo
|
b8e95295e2
|
merged from master
|
2020-04-30 11:27:59 +02:00 |
Miriam Baglioni
|
c89fe762b1
|
modified relation datasource organization
|
2020-04-30 11:17:03 +02:00 |
Miriam Baglioni
|
3abb76ff7a
|
merge with upstream
|
2020-04-30 11:15:54 +02:00 |
Michele Artini
|
eb9bd42970
|
fixed a problem with journals
|
2020-04-30 11:06:05 +02:00 |
Miriam Baglioni
|
638a3c465b
|
-
|
2020-04-30 11:05:17 +02:00 |
Michele Artini
|
a0a6109bbc
|
fixed a problem with journals
|
2020-04-30 11:03:46 +02:00 |
Miriam Baglioni
|
354f0162be
|
changes in the blacklist and workflow definition
|
2020-04-30 10:26:50 +02:00 |
Claudio Atzori
|
439c6255a2
|
cleanup
|
2020-04-29 19:09:07 +02:00 |
Claudio Atzori
|
77ac995770
|
cleaned up poms, added descriptions
|
2020-04-29 18:44:17 +02:00 |
Miriam Baglioni
|
3cffee74b9
|
merge with upstream
|
2020-04-29 18:25:29 +02:00 |
Miriam Baglioni
|
9ab46535e7
|
pom with the new blacklist module added
|
2020-04-29 18:17:15 +02:00 |
Miriam Baglioni
|
6a47e6191d
|
read from blacklist and write the result as relations on hdfs
|
2020-04-29 18:16:01 +02:00 |
Miriam Baglioni
|
869f576273
|
added hash map for relationship entityType id prefix, and relation inverse
|
2020-04-29 18:14:52 +02:00 |
Miriam Baglioni
|
b85ad7012a
|
reads the blacklist from the blacklist db and writes it as a set of relations on hdfs
|
2020-04-29 17:29:49 +02:00 |
Claudio Atzori
|
8fd81e863d
|
added default value for the external_stats_db_name
|
2020-04-29 15:36:24 +02:00 |
Claudio Atzori
|
c6f3ff4462
|
stats workflow content relocated into common package; added <global> property definitions in stats workflow.xml
|
2020-04-29 14:29:27 +02:00 |
Sandro La Bruzzo
|
4a89465740
|
reformatted code
|
2020-04-29 13:24:29 +02:00 |
Sandro La Bruzzo
|
a6b1a59d0a
|
merged with maaster
|
2020-04-29 13:20:57 +02:00 |
Sandro La Bruzzo
|
920c0f19c3
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-04-29 13:13:16 +02:00 |
Sandro La Bruzzo
|
09f161f1f4
|
implemented unit test
|
2020-04-29 13:13:02 +02:00 |
miconis
|
e0d14fe4f8
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-29 13:02:53 +02:00 |
miconis
|
0352d3b0ba
|
entity dumps in dedup compressed
|
2020-04-29 13:02:34 +02:00 |
Michele Artini
|
c43b4c8962
|
formatting
|
2020-04-29 12:56:58 +02:00 |
Michele Artini
|
a5d7007005
|
Fix relations in migration
Fix pom.xml in dhp-stats-update
|
2020-04-29 12:05:41 +02:00 |
Miriam Baglioni
|
f7695e833c
|
resolved conflicts
|
2020-04-29 11:41:31 +02:00 |
Claudio Atzori
|
3616d0f88d
|
Merge pull request 'Adding the stats workflow to the dnet-hadoop hierarchy' (#6) from spyros/dnet-hadoop:master into master
Integrating stats update workflow.
|
2020-04-29 10:35:02 +02:00 |
Claudio Atzori
|
964972d29a
|
added data provision workflow definition WIP
|
2020-04-29 09:25:50 +02:00 |
Enrico Ottonello
|
1edcd53581
|
added shell actions to download all 11 activities files from ORCID
|
2020-04-28 20:25:09 +02:00 |
miconis
|
62e467eb0c
|
assertion numbers updated to fit the new implementation of the pace-core
|
2020-04-28 11:46:23 +02:00 |
Claudio Atzori
|
6f5b899038
|
reformatted code according to the updated style descriptor
|
2020-04-28 11:23:29 +02:00 |
Claudio Atzori
|
ac25f2d8d1
|
integrated changes from master
|
2020-04-28 08:55:28 +02:00 |
Miriam Baglioni
|
2980e50edf
|
merge upstream
|
2020-04-27 15:06:48 +02:00 |
Claudio Atzori
|
a0bdbacdae
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:52:31 +02:00 |
Claudio Atzori
|
7a3f8085f7
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:45:40 +02:00 |
Michele Artini
|
1260d03eba
|
skip empty projects
|
2020-04-27 13:51:13 +02:00 |
Miriam Baglioni
|
df34a4ebcc
|
changed the configuration to add ignorecase option to each verb related to covid-19 community
|
2020-04-27 12:32:56 +02:00 |
Miriam Baglioni
|
7a59324ccf
|
changed the test to check for the new ignorecase option
|
2020-04-27 12:31:46 +02:00 |
Miriam Baglioni
|
986c97348d
|
added the ignorecase option to each selection verb
|
2020-04-27 12:31:05 +02:00 |
Miriam Baglioni
|
a303fc9f73
|
resources for testing propagation of result to comminuty from organization and from semrel
|
2020-04-27 11:14:16 +02:00 |
Miriam Baglioni
|
c093d764a3
|
-
|
2020-04-27 11:12:38 +02:00 |
Miriam Baglioni
|
c925e2be16
|
test for propagation of result to community from organization and result to community from semrel
|
2020-04-27 10:59:53 +02:00 |
Miriam Baglioni
|
ec7f166690
|
changed the bl because of changed of the examples for the re implementation of the propagation step
|
2020-04-27 10:58:41 +02:00 |
Miriam Baglioni
|
6135096ef1
|
refactoring
|
2020-04-27 10:57:50 +02:00 |
Miriam Baglioni
|
d30e710165
|
fixed duplicates action name in the workflow
|
2020-04-27 10:52:30 +02:00 |
Miriam Baglioni
|
f9ee343fc0
|
new parametrized workflow with preparation steps and new parameter input files
|
2020-04-27 10:48:31 +02:00 |
Miriam Baglioni
|
e2093644dc
|
changed in the workflow the directory where to store the preparedInfo and the graph genearated at this step
|
2020-04-27 10:46:44 +02:00 |
Miriam Baglioni
|
8a58bf2744
|
removed the writeUpdate option
|
2020-04-27 10:45:06 +02:00 |
Miriam Baglioni
|
5dccbe13db
|
merge with upstream
|
2020-04-27 10:43:59 +02:00 |
Miriam Baglioni
|
7b6505ec69
|
new resuorces for testing propagation of project to result after the re-implementation
|
2020-04-27 10:42:16 +02:00 |
Miriam Baglioni
|
1b0e0bd1b5
|
refactoring
|
2020-04-27 10:40:26 +02:00 |
Miriam Baglioni
|
e5a177f0a7
|
refactoring
|
2020-04-27 10:36:21 +02:00 |
Miriam Baglioni
|
e000754c92
|
refactoring
|
2020-04-27 10:34:03 +02:00 |
Miriam Baglioni
|
95a54d5460
|
removed the writeUpdate option. The update is available in the preparedInfo path
|
2020-04-27 10:30:32 +02:00 |
Miriam Baglioni
|
8802e4126b
|
re-implemented inverting the couple: from (projectId, relatedResultList) to (resultId, relatedProjectList)
|
2020-04-27 10:26:55 +02:00 |
Enrico Ottonello
|
a1861b9eaa
|
workflow works in parallel on 2 activity files
|
2020-04-24 18:33:37 +02:00 |
Enrico Ottonello
|
941e94af06
|
added workflow for generating authors with dois data sequence file
|
2020-04-24 15:50:40 +02:00 |
Claudio Atzori
|
268462623a
|
refined definition of equals and hash methods for Oaf model classes, now based on entity identifier, while relations consider sourceid, targetid and relationship semantic; Factored out function to group Oaf objects in grouping operations; Raw graph creation procedure merges entities and relationships providing the same identity
|
2020-04-24 14:42:01 +02:00 |
Claudio Atzori
|
a3e480d1c9
|
implmented DispatchEntitiesApplication using spark2 datasets
|
2020-04-24 14:36:53 +02:00 |
Claudio Atzori
|
48157e0fc4
|
GraphHiveImporterJob moved in dedicate package
|
2020-04-24 14:32:28 +02:00 |
Miriam Baglioni
|
adcbf0e29a
|
refactoring
|
2020-04-24 10:47:43 +02:00 |
Claudio Atzori
|
278fc9d276
|
code formatting
|
2020-04-23 18:51:38 +02:00 |
miconis
|
5414236644
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-23 18:17:23 +02:00 |
miconis
|
8d258c85ff
|
spark dedup test fixed, sample for dataset and orp added, test implemented
|
2020-04-23 18:16:20 +02:00 |
Michele Artini
|
072eae3803
|
fixed a problem with missing contexts
|
2020-04-23 16:42:49 +02:00 |
Michele Artini
|
b164d96874
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-23 16:19:16 +02:00 |
Michele Artini
|
d920ce501e
|
fixed a problem with missing instances
|
2020-04-23 16:18:40 +02:00 |
Miriam Baglioni
|
0e447add66
|
removed unuseful classes
|
2020-04-23 12:59:43 +02:00 |
Miriam Baglioni
|
edb00db86a
|
refactoring
|
2020-04-23 12:57:35 +02:00 |
Miriam Baglioni
|
44fab140de
|
-
|
2020-04-23 12:42:07 +02:00 |
Miriam Baglioni
|
769aa8178a
|
refactoring
|
2020-04-23 12:40:44 +02:00 |
Miriam Baglioni
|
d8dc31d4af
|
refactoring
|
2020-04-23 12:35:49 +02:00 |
Miriam Baglioni
|
8c5dac5cc3
|
removed unuseful classes
|
2020-04-23 12:30:58 +02:00 |
Miriam Baglioni
|
15656684b9
|
added proeprties for the preparation step and actual propagation. Added the new parametrized workflow
|
2020-04-23 12:13:34 +02:00 |
Miriam Baglioni
|
6f35f5ca42
|
added the steps of reset output dir and copy information not changed by the propagation step
|
2020-04-23 12:12:07 +02:00 |
Miriam Baglioni
|
19cd5b85c0
|
changed the classname to execute
|
2020-04-23 12:07:41 +02:00 |
Miriam Baglioni
|
fa2ff5c6f5
|
refactoring
|
2020-04-23 11:58:26 +02:00 |
Miriam Baglioni
|
540f70298b
|
added missing property
|
2020-04-23 11:51:48 +02:00 |
Miriam Baglioni
|
e431fe4f5b
|
added the implements Serializable to each class
|
2020-04-23 11:48:47 +02:00 |
Miriam Baglioni
|
24fa81d7e8
|
implementation parametrized for result type
|
2020-04-23 11:44:19 +02:00 |
Miriam Baglioni
|
ab2a24cc2b
|
changed the dependency to use reflections to find annotated classes
|
2020-04-23 11:08:47 +02:00 |
Miriam Baglioni
|
5153d88bd3
|
defiition of workflow and properties for bulktagging
|
2020-04-23 11:04:53 +02:00 |
Miriam Baglioni
|
3b2e4ab670
|
test for bulktag
|
2020-04-23 10:00:10 +02:00 |
Sandro La Bruzzo
|
fdc0523e4c
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-23 09:34:13 +02:00 |
Sandro La Bruzzo
|
4ba386d996
|
improved crossref mapping
|
2020-04-23 09:33:48 +02:00 |
Claudio Atzori
|
8851050814
|
replaced hive_db_name with hiveDbName
|
2020-04-23 08:36:40 +02:00 |
Claudio Atzori
|
91f81107b1
|
applying code formatting
|
2020-04-23 07:52:32 +02:00 |
Claudio Atzori
|
1e7583c5a6
|
filtered invisible records in data provision workflow
|
2020-04-23 07:51:34 +02:00 |
Claudio Atzori
|
9ddafd46ca
|
fixed dedup record id prefix, set the correct dataInfo in the DedupRecordFactory
|
2020-04-23 07:50:18 +02:00 |
Claudio Atzori
|
ade4cb97af
|
fixed parameters passed to the postprocessing action in the workflow mapping the graph as hive DB
|
2020-04-22 18:24:06 +02:00 |
Sandro La Bruzzo
|
bb6c9785b4
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-22 15:00:57 +02:00 |
Sandro La Bruzzo
|
157915988c
|
improved crossref mapping
|
2020-04-22 15:00:44 +02:00 |
Enrico Ottonello
|
5977f08e92
|
merged
|
2020-04-22 14:50:50 +02:00 |
Enrico Ottonello
|
7d759947ae
|
used vtd for parsing orcid xml record, set 4g heapspace
|
2020-04-22 14:41:19 +02:00 |
Claudio Atzori
|
e81960335c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-22 10:46:37 +02:00 |
Michele Artini
|
9e4d58f505
|
ResultType
|
2020-04-22 10:07:26 +02:00 |
Claudio Atzori
|
c891661822
|
small adjustments in the graph2hive workflow
|
2020-04-21 18:52:23 +02:00 |
Miriam Baglioni
|
259525cb93
|
Merge remote-tracking branch 'upstream/master'
|
2020-04-21 18:33:46 +02:00 |
Miriam Baglioni
|
30e53261d0
|
minor
|
2020-04-21 18:00:53 +02:00 |
Claudio Atzori
|
0b55795d4d
|
small adjustments in the provisioning workflow
|
2020-04-21 16:15:04 +02:00 |
Claudio Atzori
|
88fbb3a353
|
added sparkSqlWarehouseDir to the default extra spark options passed to each workflow
|
2020-04-21 16:13:43 +02:00 |
Claudio Atzori
|
cd320efa96
|
added extra spark options to graph to hive workflow
|
2020-04-21 16:12:20 +02:00 |
Miriam Baglioni
|
90c768dde6
|
added shaded libs module
|
2020-04-21 16:03:51 +02:00 |
Claudio Atzori
|
91e72a6944
|
Dataset based implementation for SparkCreateDedupRecord phase, fixed datasource entity dump supplementing dedup unit tests
|
2020-04-21 12:06:08 +02:00 |
miconis
|
5c9ef08a8e
|
spark dedup test fixed
|
2020-04-21 10:19:04 +02:00 |
Sandro La Bruzzo
|
3624947a7f
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-21 08:34:24 +02:00 |
Claudio Atzori
|
d772d967aa
|
restored changes from master branch
|
2020-04-20 18:53:06 +02:00 |
Claudio Atzori
|
eb8a020859
|
fixed behaviour of DedupRecordFactory
|
2020-04-20 18:44:06 +02:00 |
Sandro La Bruzzo
|
039f9b7871
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-20 18:10:29 +02:00 |
Sandro La Bruzzo
|
e4b105cece
|
improved crossref mapping
|
2020-04-20 18:10:07 +02:00 |
Claudio Atzori
|
ede1af3d85
|
Merge branch 'master' into deduptesting
|
2020-04-20 16:52:14 +02:00 |
miconis
|
1102e32462
|
SparkDedupTest updated and organization dump fixed
|
2020-04-20 16:49:01 +02:00 |
Claudio Atzori
|
667d23c58b
|
finalising Actionset migration workflow
|
2020-04-20 16:45:21 +02:00 |