Alessia Bardi
|
b7a39731a6
|
assert, not print
|
2020-07-12 19:28:56 +02:00 |
Miriam Baglioni
|
f9ad6f3255
|
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
|
2020-07-10 19:42:53 +02:00 |
Miriam Baglioni
|
c27f12d6e8
|
avoid to consider _SUCCESS file
|
2020-07-10 19:42:23 +02:00 |
Claudio Atzori
|
770adc26e9
|
WIP aggregator to make relationships unique
|
2020-07-10 19:35:10 +02:00 |
Claudio Atzori
|
ecf119f37a
|
Merge branch 'master' into deduptesting
|
2020-07-10 19:04:16 +02:00 |
Claudio Atzori
|
31071e363f
|
Merge branch 'provision_indexing'
|
2020-07-10 19:03:57 +02:00 |
Claudio Atzori
|
06c1913062
|
added different limits for grouping by source and by target, incremented spark.sql.shuffle.partitions for the join operations
|
2020-07-10 19:03:33 +02:00 |
Claudio Atzori
|
cc77446dc4
|
added dbSchema parameter to the raw_db workflow
|
2020-07-10 19:01:50 +02:00 |
Claudio Atzori
|
4c3836f62e
|
materialize the related entities before joining them
|
2020-07-10 19:00:44 +02:00 |
Michele Artini
|
e1ae964bc4
|
stats
|
2020-07-10 16:12:08 +02:00 |
Claudio Atzori
|
752d28f8eb
|
make the relations produced by the dedup SparkPropagateRelation jon unique
|
2020-07-10 15:09:50 +02:00 |
Sandro La Bruzzo
|
c01efed79b
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-10 14:44:57 +02:00 |
Sandro La Bruzzo
|
a7d3977481
|
added generation of EBI Dataset
|
2020-07-10 14:44:50 +02:00 |
Claudio Atzori
|
b21866a2da
|
allow to set different to relations cut points by source and by target; adjusted weight assigned to relationship types
|
2020-07-10 13:59:48 +02:00 |
Claudio Atzori
|
ff4d6214f1
|
experimenting with pruning of relations
|
2020-07-10 10:06:41 +02:00 |
Miriam Baglioni
|
faea30cda0
|
-
|
2020-07-09 14:05:21 +02:00 |
Michele Artini
|
2d742a84ae
|
DedupConfig as json file
|
2020-07-09 12:53:46 +02:00 |
Miriam Baglioni
|
a634794242
|
merge upstream
|
2020-07-09 11:46:51 +02:00 |
Michele Artini
|
a44b9b36b9
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-09 11:02:31 +02:00 |
Michele Artini
|
1c6a171633
|
updated pom
|
2020-07-09 11:02:09 +02:00 |
Claudio Atzori
|
3c728aaa0c
|
trying to overcome OOM errors during duplicate scan phase
|
2020-07-08 22:39:51 +02:00 |
Claudio Atzori
|
18c555cd79
|
Merge branch 'master' into deduptesting
|
2020-07-08 22:32:01 +02:00 |
Claudio Atzori
|
4365cf41d7
|
trying to overcome OOM errors during duplicate scan phase
|
2020-07-08 22:31:46 +02:00 |
Claudio Atzori
|
67e1d222b6
|
bulk cleaning when found null or empty, sets bestaccessrights evaluating the result instances
|
2020-07-08 17:53:35 +02:00 |
Alessia Bardi
|
853e8d7987
|
test for software merge
|
2020-07-08 17:03:53 +02:00 |
Claudio Atzori
|
610d377d57
|
first implementation of the BETA & PROD graphs merge procedure
|
2020-07-08 16:54:26 +02:00 |
Alessia Bardi
|
9a898c0e4c
|
Json schema generator
|
2020-07-08 12:52:00 +02:00 |
Alessia Bardi
|
636f9ce7d6
|
json schema generator lib
|
2020-07-08 12:50:57 +02:00 |
Alessia Bardi
|
8f83b726fa
|
Dump json schema compliant to json schema Draft 7
|
2020-07-08 12:48:46 +02:00 |
Claudio Atzori
|
e2ea30f89d
|
updated graph construction workflow definition: cleaning wf moved at the bottom to include cleaning of the information produced by the enrichment workflows
|
2020-07-08 12:16:24 +02:00 |
Miriam Baglioni
|
1b0b968548
|
fixed issue on substring
|
2020-07-08 12:11:51 +02:00 |
Miriam Baglioni
|
7fe00cb4fb
|
-
|
2020-07-08 10:29:37 +02:00 |
Miriam Baglioni
|
375ef07d7b
|
changed the description for the upload
|
2020-07-07 18:41:27 +02:00 |
Miriam Baglioni
|
35c8265793
|
added the json extention to filename
|
2020-07-07 18:29:49 +02:00 |
Miriam Baglioni
|
81434f8e5e
|
added method newInstance
|
2020-07-07 18:26:10 +02:00 |
Miriam Baglioni
|
817cddfc52
|
-
|
2020-07-07 18:25:12 +02:00 |
Miriam Baglioni
|
a66aa9bd83
|
removed unuseful tests
|
2020-07-07 18:25:00 +02:00 |
Miriam Baglioni
|
9b20a21b24
|
removed unuseful tests
|
2020-07-07 18:23:37 +02:00 |
Miriam Baglioni
|
8a1b42ff21
|
added check to verify that dump contains at least one product
|
2020-07-07 18:21:35 +02:00 |
Miriam Baglioni
|
d86adb82a7
|
-
|
2020-07-07 18:20:51 +02:00 |
Miriam Baglioni
|
b2782025f6
|
enabled the whole workflow to run. Added property to give priority to depenedency in the classpath - to solve conflicts
|
2020-07-07 18:10:47 +02:00 |
Miriam Baglioni
|
83d2c84b77
|
added constraints to xquery so that to get only profiles with status manager or all
|
2020-07-07 18:09:48 +02:00 |
Miriam Baglioni
|
4c8d86493c
|
-
|
2020-07-07 18:09:06 +02:00 |
Miriam Baglioni
|
0208bc18f3
|
added new resource for testing
|
2020-07-07 17:47:24 +02:00 |
Miriam Baglioni
|
f5bb65c9ef
|
the json schema for the dump of the results
|
2020-07-07 17:34:40 +02:00 |
Michele Artini
|
dffa0b01a2
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-07 15:37:29 +02:00 |
Michele Artini
|
efadbdb2bc
|
fixed a bug with duplicated events
|
2020-07-07 15:37:13 +02:00 |
Claudio Atzori
|
8af8e7481a
|
code formatting
|
2020-07-07 14:23:34 +02:00 |
Claudio Atzori
|
b383ed42fa
|
pass optional parameter relationFilter to the PrepareRelationJob implementation
|
2020-07-07 14:21:28 +02:00 |
Claudio Atzori
|
911894a987
|
Merge branch 'deduptesting'
|
2020-07-07 14:20:43 +02:00 |
Miriam Baglioni
|
c19818a3f8
|
merge branch with fork master
|
2020-07-06 13:58:23 +02:00 |
Miriam Baglioni
|
d22240c0ba
|
merge upstream
|
2020-07-06 13:58:02 +02:00 |
Michele Artini
|
edf6c6c4dc
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-03 11:48:24 +02:00 |
Michele Artini
|
04bebb708c
|
some fixes
|
2020-07-03 11:48:12 +02:00 |
Claudio Atzori
|
c3d67f709a
|
adjusted dedup configuration for result entities: using new wordssuffixprefix clustering function, removed ngrampairs, adjusted queueMaxSize (800) and slidingWindowSize (80)
|
2020-07-02 17:35:22 +02:00 |
Miriam Baglioni
|
f8bf4acd76
|
-
|
2020-07-02 16:03:11 +02:00 |
Miriam Baglioni
|
e6c79d44e6
|
-
|
2020-07-02 16:02:02 +02:00 |
Miriam Baglioni
|
d7f6f0c216
|
changed code to use other lib
|
2020-07-02 16:01:34 +02:00 |
Miriam Baglioni
|
8fdc9e070c
|
added dependency to OkHttp
|
2020-07-02 16:01:08 +02:00 |
Miriam Baglioni
|
94500a581b
|
merge branch with fork master
|
2020-07-02 14:25:39 +02:00 |
Miriam Baglioni
|
c133a23cf0
|
merge upstream
|
2020-07-02 14:24:57 +02:00 |
Claudio Atzori
|
1d39f7901c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-07-02 12:45:01 +02:00 |
Claudio Atzori
|
0f77cac4b5
|
fix: deduper must use queueMaxSize instead of groupMaxSize for the block definition
|
2020-07-02 12:43:51 +02:00 |
Sandro La Bruzzo
|
18b9330312
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-02 12:43:19 +02:00 |
Michele Artini
|
b413db0bff
|
white/blacklists
|
2020-07-02 12:43:03 +02:00 |
Claudio Atzori
|
d380b85246
|
unit test for the preparation of the relations
|
2020-07-02 12:42:13 +02:00 |
Claudio Atzori
|
ed1c7e5d75
|
fixed workflow for the import of the claims alone
|
2020-07-02 12:40:21 +02:00 |
Sandro La Bruzzo
|
07f0723fa7
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-02 12:37:49 +02:00 |
Sandro La Bruzzo
|
1d420eedb4
|
added generation of EBI Dataset
|
2020-07-02 12:37:43 +02:00 |
Claudio Atzori
|
e4a29a4513
|
fixed workflow for the import of the claims alone
|
2020-07-02 12:36:33 +02:00 |
Michele Artini
|
3bcdfbabe9
|
list with limits
|
2020-07-01 08:42:39 +02:00 |
Michele Artini
|
59a5421c24
|
indexing, accumulators, limited lists
|
2020-06-30 16:17:09 +02:00 |
Michele Artini
|
6f13673464
|
accumulators
|
2020-06-29 16:33:32 +02:00 |
Sandro La Bruzzo
|
dab783b173
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-29 09:05:00 +02:00 |
Michele Artini
|
a6ea432435
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-29 08:44:20 +02:00 |
Michele Artini
|
35ae381d28
|
all events matchers
|
2020-06-29 08:43:56 +02:00 |
Claudio Atzori
|
7817338e05
|
added test to verify the relation pre-processing
|
2020-06-26 17:58:33 +02:00 |
Claudio Atzori
|
8d59fdf34e
|
WIP: dataset based PrepareRelationsJob
|
2020-06-26 14:32:58 +02:00 |
Michele Artini
|
2393d9da2f
|
limits
|
2020-06-26 11:20:45 +02:00 |
Sandro La Bruzzo
|
96ce124b59
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-25 17:00:43 +02:00 |
Miriam Baglioni
|
4a7de07ea2
|
refactoring
|
2020-06-25 16:32:40 +02:00 |
Miriam Baglioni
|
54a12978d3
|
fixed issue in xquery
|
2020-06-25 16:30:20 +02:00 |
Michele Artini
|
408165a756
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-25 15:53:35 +02:00 |
Michele Artini
|
e8fb305f18
|
compilation of event map
|
2020-06-25 15:53:20 +02:00 |
Michele Artini
|
4eb3e109d7
|
compilation of event map
|
2020-06-25 15:45:50 +02:00 |
Claudio Atzori
|
d839e88783
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-25 14:06:30 +02:00 |
Claudio Atzori
|
6f5771c1c9
|
sets author.rank when null
|
2020-06-25 14:06:21 +02:00 |
Michele Artini
|
e28033c6d8
|
some fixes
|
2020-06-25 13:01:09 +02:00 |
Claudio Atzori
|
216975c4ec
|
restored complete provision workflow
|
2020-06-25 12:55:52 +02:00 |
Claudio Atzori
|
2d77d3a388
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-25 12:54:30 +02:00 |
Claudio Atzori
|
93f627ea51
|
code formatting
|
2020-06-25 12:54:21 +02:00 |
Miriam Baglioni
|
05a99cfb61
|
change the position of value and description elements in the workflow definition
|
2020-06-25 12:36:08 +02:00 |
Claudio Atzori
|
7df2712824
|
Merge branch 'provision_indexing'
|
2020-06-25 12:22:41 +02:00 |
Claudio Atzori
|
e62333192c
|
WIP: prepare relation job
|
2020-06-25 12:22:18 +02:00 |
Claudio Atzori
|
6933ec11fb
|
WIP: prepare relation job
|
2020-06-25 11:04:12 +02:00 |
Sandro La Bruzzo
|
a6c0faac70
|
added test to verify secondary sorting
|
2020-06-25 10:48:15 +02:00 |
Claudio Atzori
|
69b0391708
|
WIP: prepare relation job
|
2020-06-25 10:19:56 +02:00 |
Michele Artini
|
abcbebcbb4
|
fixed generation of ids
|
2020-06-25 09:50:46 +02:00 |
Michele Artini
|
77d2a1b1c4
|
params to choose sql queries for beta or production
|
2020-06-25 09:28:13 +02:00 |
Claudio Atzori
|
46e76affeb
|
WIP: prepare relation job
|
2020-06-24 19:01:15 +02:00 |
Claudio Atzori
|
0e723d378b
|
added default from vocab for missing instance.refereed; remove spurious prefixes from orcid values; WIP: prepare relation job
|
2020-06-24 18:34:42 +02:00 |
Michele Artini
|
202f6e62ff
|
Splitted join wf
|
2020-06-24 15:47:06 +02:00 |
Sandro La Bruzzo
|
96689a8994
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-24 14:06:50 +02:00 |
Sandro La Bruzzo
|
46631a4421
|
updated mapping scholexplorer to OAF
|
2020-06-24 14:06:38 +02:00 |
Michele Artini
|
e53dd62e87
|
minot changes
|
2020-06-24 09:24:45 +02:00 |
Michele Artini
|
8b9933b934
|
refactoring aggregators
|
2020-06-24 08:57:13 +02:00 |
Miriam Baglioni
|
3e5570de7a
|
-
|
2020-06-23 15:44:54 +02:00 |
Michele Artini
|
d13e3d3f68
|
fixed paths
|
2020-06-23 11:01:42 +02:00 |
Michele Artini
|
8386c6f90d
|
filter of valid resultResult relations
|
2020-06-23 10:24:15 +02:00 |
Michele Artini
|
38bb45d0b6
|
test osf:refereed
|
2020-06-23 10:14:39 +02:00 |
Michele Artini
|
c3286f4c37
|
fixed relType
|
2020-06-23 09:32:32 +02:00 |
Miriam Baglioni
|
507f7a94a8
|
added one of the main zenodo communities to the tagging conf for testing purposes
|
2020-06-23 08:45:27 +02:00 |
Michele Artini
|
af2f7705fc
|
partial refactoring of some joins
|
2020-06-23 08:37:35 +02:00 |
Miriam Baglioni
|
af1d40351b
|
changed XQuery to add also the main Zenodo community among the communities associated to the openaire community
|
2020-06-22 19:20:54 +02:00 |
Miriam Baglioni
|
e4b21be004
|
-
|
2020-06-22 17:31:50 +02:00 |
Miriam Baglioni
|
afa19b0c84
|
changed the way to PUT the files to the rest API
|
2020-06-22 17:20:07 +02:00 |
Miriam Baglioni
|
250fd1c854
|
merge branch with fork master
|
2020-06-22 16:25:48 +02:00 |
Claudio Atzori
|
8a3bc7c183
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-22 14:12:33 +02:00 |
Claudio Atzori
|
e162ba5075
|
added dnet workflows to orchestrate the execution of graph2hive, updateSolr and updateStats oozie wfs
|
2020-06-22 14:12:28 +02:00 |
Michele Artini
|
3ce20c198e
|
reformatting
|
2020-06-22 12:14:25 +02:00 |
Michele Artini
|
ed787398b3
|
refactoring wf
|
2020-06-22 11:45:14 +02:00 |
Claudio Atzori
|
9cd27183b6
|
[maven-release-plugin] prepare for next development iteration
|
2020-06-22 11:27:44 +02:00 |
Claudio Atzori
|
1e3dab0631
|
[maven-release-plugin] prepare release dhp-1.2.3
|
2020-06-22 11:27:39 +02:00 |
Miriam Baglioni
|
df80ae5c1b
|
merge branch with fork master
|
2020-06-22 10:51:23 +02:00 |
Miriam Baglioni
|
e8f914f8b3
|
-
|
2020-06-22 10:50:41 +02:00 |
Miriam Baglioni
|
edeb862476
|
excluded dependency in module that generates conflict
|
2020-06-22 10:49:56 +02:00 |
Miriam Baglioni
|
185facb8e5
|
change the deprecated DefaultHttpClient with the CLoseableHttpClient
|
2020-06-22 10:49:10 +02:00 |
Claudio Atzori
|
961a0d0b49
|
[actionset promotion] log debugging info in case of error in the action payload extraction or parsing the data
|
2020-06-22 10:20:45 +02:00 |
Claudio Atzori
|
5e8b922962
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-22 09:50:47 +02:00 |
Claudio Atzori
|
7d416f08d8
|
graph cleaning workflow: set hostedby to unknown repository when defined as NULL
|
2020-06-22 09:50:43 +02:00 |
Michele Artini
|
16c7a18435
|
refactoring
|
2020-06-22 08:51:31 +02:00 |
Miriam Baglioni
|
669a509430
|
-
|
2020-06-19 17:39:46 +02:00 |
Michele Artini
|
f9fc64ffaf
|
âÃMerge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-19 15:24:43 +02:00 |
Michele Artini
|
d88fe0ac84
|
join methods
|
2020-06-19 15:24:30 +02:00 |
Sandro La Bruzzo
|
464eeeec87
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-19 15:11:53 +02:00 |
Sandro La Bruzzo
|
1681de672d
|
updated mapping scholexplorer to OAF
|
2020-06-19 15:11:46 +02:00 |
Michele Artini
|
4822747313
|
some fixes
|
2020-06-19 13:53:56 +02:00 |
Michele Artini
|
834f139e6e
|
fixed some NPE
|
2020-06-19 12:33:29 +02:00 |
Claudio Atzori
|
d0ac7514b2
|
cleaning workflow to include cleaning of default values
|
2020-06-18 19:37:25 +02:00 |
Miriam Baglioni
|
44a12d244f
|
-
|
2020-06-18 18:38:54 +02:00 |
Michele Artini
|
52f62d5d8c
|
events
|
2020-06-18 14:49:13 +02:00 |
Miriam Baglioni
|
fb80353018
|
-
|
2020-06-18 14:21:36 +02:00 |
Michele Artini
|
61634fbfe0
|
removed kryo encoding
|
2020-06-18 14:09:58 +02:00 |
Michele Artini
|
8d2b199dd2
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-18 13:15:34 +02:00 |
Michele Artini
|
e659b02e6b
|
some wf fixing
|
2020-06-18 13:15:13 +02:00 |
Michele Artini
|
9a847b4557
|
some wf fixing
|
2020-06-18 13:14:10 +02:00 |
Miriam Baglioni
|
65bf312360
|
merge branch with fork master
|
2020-06-18 11:35:27 +02:00 |
Miriam Baglioni
|
3953f56bd3
|
added dependency to pom
|
2020-06-18 11:34:47 +02:00 |
Miriam Baglioni
|
a118b66858
|
-
|
2020-06-18 11:34:30 +02:00 |
Miriam Baglioni
|
f9578312b5
|
-
|
2020-06-18 11:34:15 +02:00 |
Miriam Baglioni
|
8b145e6aba
|
-
|
2020-06-18 11:25:28 +02:00 |
Miriam Baglioni
|
e8b3e972f2
|
changed the input params and the workflow definition to tackle the Result as all result product produced
|
2020-06-18 11:25:05 +02:00 |
Miriam Baglioni
|
3233b01089
|
changes due to adding all the result type under Result
|
2020-06-18 11:22:58 +02:00 |
Miriam Baglioni
|
5c8533d1a1
|
changed in the testing classes
|
2020-06-18 11:20:08 +02:00 |
Miriam Baglioni
|
bc8611a95a
|
added new resources for testing
|
2020-06-18 11:19:20 +02:00 |
Sandro La Bruzzo
|
9bf67f5de1
|
resolved conflicts
|
2020-06-17 09:15:43 +02:00 |
Sandro La Bruzzo
|
1d4275acc4
|
implemented first version of exportation of Scholexplorer into ActionSet
|
2020-06-17 09:10:38 +02:00 |
miconis
|
5233b15265
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-16 18:31:19 +02:00 |
miconis
|
11b77b9f4e
|
json dumps for entity merge test modified to fit the new model. title merge adjusted to fix the error
|
2020-06-16 18:31:11 +02:00 |
Claudio Atzori
|
64f02de5d3
|
updated workflow definition to include the cleaning step
|
2020-06-16 17:48:51 +02:00 |
Claudio Atzori
|
306669209f
|
code formatting
|
2020-06-16 16:54:44 +02:00 |
Claudio Atzori
|
1bc1d15eaf
|
stubbing for mock datasource.identities must be typed as array
|
2020-06-16 16:54:28 +02:00 |
Claudio Atzori
|
631fef12a7
|
Merge branch 'master' into dhp_oaf_model
|
2020-06-16 16:11:19 +02:00 |
Michele Artini
|
9e2c23e391
|
partial refactoring
|
2020-06-16 15:55:42 +02:00 |
Michele Artini
|
113c9b1de0
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-16 15:53:39 +02:00 |
Michele Artini
|
76ea7607f7
|
partial refactoring
|
2020-06-16 15:53:13 +02:00 |
Claudio Atzori
|
603b1bd0bb
|
Merge branch 'master' into dhp_oaf_model
|
2020-06-16 15:43:59 +02:00 |
Claudio Atzori
|
5441f01586
|
Merge pull request 'missing landingPage urls in instances' (#22) from instances-with-landing-page into master
Looks good, thanks!
|
2020-06-16 15:32:44 +02:00 |
Claudio Atzori
|
89859111ee
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-16 15:28:29 +02:00 |
Claudio Atzori
|
4ec262db53
|
included externalreference(s) in the result view on the Hive graph DB
|
2020-06-16 15:28:20 +02:00 |
Michele Artini
|
8a4f84f8c0
|
refactoring
|
2020-06-16 12:34:13 +02:00 |
Claudio Atzori
|
2a4f65795f
|
WIP: graph cleaner implementation
|
2020-06-15 18:32:24 +02:00 |
Claudio Atzori
|
c15c8c0ad0
|
map datasource identities (including piwik ids) as original IDs
|
2020-06-15 16:07:30 +02:00 |
Miriam Baglioni
|
9dd3ef22c5
|
merge branch with fork master
|
2020-06-15 11:23:26 +02:00 |
Miriam Baglioni
|
68cf0fd03f
|
test input
|
2020-06-15 11:14:42 +02:00 |
Miriam Baglioni
|
0467145ae3
|
test for graph dump
|
2020-06-15 11:13:51 +02:00 |
Miriam Baglioni
|
e43eedb5b0
|
added resources and workflow for dump of community products
|
2020-06-15 11:13:21 +02:00 |
Miriam Baglioni
|
f96ca900e1
|
fixed issues while running on cluster
|
2020-06-15 11:12:14 +02:00 |
Miriam Baglioni
|
20b9e67728
|
added new class funder
|
2020-06-15 11:06:18 +02:00 |
Claudio Atzori
|
0d52816244
|
WIP: graph cleaner implementation
|
2020-06-13 13:06:04 +02:00 |
Claudio Atzori
|
bed65a1be6
|
WIP: graph cleaner implementation
|
2020-06-12 18:25:47 +02:00 |
Claudio Atzori
|
c4d9f1837f
|
[maven-release-plugin] prepare for next development iteration
|
2020-06-12 12:21:08 +02:00 |
Claudio Atzori
|
f0746a7605
|
[maven-release-plugin] prepare release dhp-1.2.2
|
2020-06-12 12:21:03 +02:00 |
Claudio Atzori
|
463489f59f
|
code formatting
|
2020-06-12 12:03:25 +02:00 |
Claudio Atzori
|
4bcad1c9c3
|
Merge branch 'graph_cleaning'
|
2020-06-12 11:40:25 +02:00 |
Claudio Atzori
|
cdb1956fe9
|
WIP: graph cleaner implementation
|
2020-06-12 11:36:59 +02:00 |
Alessia Bardi
|
b347499745
|
do not use deprecated subreltype
|
2020-06-12 10:58:02 +02:00 |
Claudio Atzori
|
97b1c4057c
|
WIP: graph cleaner implementation
|
2020-06-12 10:45:18 +02:00 |
Claudio Atzori
|
ba8a024af9
|
avoid NPEs merging titles
|
2020-06-12 10:45:11 +02:00 |
Michele Artini
|
30ea1bda88
|
oozie workflow
|
2020-06-12 10:42:35 +02:00 |
Michele Artini
|
c22cb5a3c6
|
refactoring
|
2020-06-12 09:47:55 +02:00 |
Michele Artini
|
472cf77639
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-11 14:30:47 +02:00 |
Michele Artini
|
c6b5bb3f17
|
orcid events
|
2020-06-11 14:30:24 +02:00 |
Michele Artini
|
c2e1b66e83
|
Revert "orcid events"
This reverts commit 48959e9a17 .
|
2020-06-11 14:28:03 +02:00 |
Michele Artini
|
48959e9a17
|
orcid events
|
2020-06-11 14:24:02 +02:00 |
Miriam Baglioni
|
e145972962
|
-
|
2020-06-11 13:08:39 +02:00 |
Miriam Baglioni
|
a01800224c
|
-
|
2020-06-11 13:02:04 +02:00 |
Miriam Baglioni
|
356dd582a3
|
map construction moved in class
|
2020-06-11 12:59:22 +02:00 |
Alessia Bardi
|
e79943965b
|
Fixes #5604: field oamandatepublications in XML
|
2020-06-11 12:49:31 +02:00 |
Michele Artini
|
a41e0cb648
|
missing landingPage urls in instances
|
2020-06-11 12:28:34 +02:00 |
Michele Artini
|
04fdcacd83
|
results with all joined entities
|
2020-06-11 11:25:18 +02:00 |
Michele Artini
|
99f88e1cb8
|
fixed generation entities from claims
|
2020-06-11 10:51:57 +02:00 |
Miriam Baglioni
|
db27663750
|
-
|
2020-06-11 10:49:01 +02:00 |
Miriam Baglioni
|
bb9f21d0e7
|
job test for class producing first step of results dump
|
2020-06-11 10:20:05 +02:00 |
Claudio Atzori
|
d1d92c4d8c
|
fixed integration of claims in the graph
|
2020-06-11 10:12:00 +02:00 |
Claudio Atzori
|
953da4a427
|
Merge branch 'master' into graph_cleaning
|
2020-06-10 21:36:56 +02:00 |
Claudio Atzori
|
f1bce64391
|
WIP: graph cleaner implementation
|
2020-06-10 21:36:31 +02:00 |
Claudio Atzori
|
67c7b31ba6
|
Merge branch 'master' into graph_cleaning
|
2020-06-10 15:00:35 +02:00 |
Claudio Atzori
|
3ebf81d2b0
|
Merge pull request 'oaf-store-interpretation' (#21) from oaf-store-interpretation into master
Looks good, thanks Michele!
|
2020-06-10 14:58:09 +02:00 |
Michele Artini
|
5869cb76b3
|
reformatting
|
2020-06-10 12:11:16 +02:00 |
Michele Artini
|
c08e66e01e
|
fixed a workflow parameter
|
2020-06-10 10:11:56 +02:00 |
Michele Artini
|
7177a32d75
|
import of invisible stores
|
2020-06-10 10:04:00 +02:00 |
Claudio Atzori
|
ce12f236bb
|
disabled test, need to need to update the joined_entity.json file
|
2020-06-09 20:07:36 +02:00 |
Claudio Atzori
|
a2fdf85ba1
|
WIP: graph cleaner implementation
|
2020-06-09 19:52:53 +02:00 |
Alessia Bardi
|
4551c1082f
|
mapping csv for orcid
|
2020-06-09 18:08:47 +02:00 |
Alessia Bardi
|
2d3f7d1eb4
|
fixed log classes to make the ORCID test run
|
2020-06-09 18:07:14 +02:00 |
Alessia Bardi
|
a3a6755d58
|
mapping csv for Unpaywall
|
2020-06-09 17:45:44 +02:00 |
Claudio Atzori
|
d9f33582c5
|
WIP: graph cleaner implementation
|
2020-06-09 17:20:40 +02:00 |
Alessia Bardi
|
f3b033cf09
|
added csv line for funders from Crossref
|
2020-06-09 17:08:26 +02:00 |
Alessia Bardi
|
79969d78b9
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-09 17:05:39 +02:00 |
Alessia Bardi
|
fc4d220964
|
updated function name for SNSF
|
2020-06-09 17:05:31 +02:00 |
Michele Artini
|
baaa55f4a3
|
use of pace to calculate trusts
|
2020-06-09 16:01:31 +02:00 |
Alessia Bardi
|
33b130ec43
|
Mapping instructions for MAG
|
2020-06-09 15:57:15 +02:00 |
Miriam Baglioni
|
206abba48c
|
merge branch with fork master
|
2020-06-09 15:41:14 +02:00 |
Miriam Baglioni
|
a089db18f1
|
workflow and parameters to exucute the dump
|
2020-06-09 15:39:38 +02:00 |
Miriam Baglioni
|
6bbe27587f
|
new classes to execute the dump for products associated to community, enrich each result with project information and assign the result to each community it belongs to
|
2020-06-09 15:39:03 +02:00 |
Miriam Baglioni
|
5121cbaf6a
|
new classes for external dump. Only classes functional to dump products
|
2020-06-09 15:37:46 +02:00 |
Alessia Bardi
|
d6de406e11
|
fixed classid for subjects
|
2020-06-09 14:43:34 +02:00 |
Alessia Bardi
|
f072125152
|
map volume and issue in journal information from MAG
|
2020-06-09 14:32:10 +02:00 |
Alessia Bardi
|
b7cb1163ea
|
identifiers always start with 50
|
2020-06-09 10:39:11 +02:00 |
Alessia Bardi
|
181f52b9bc
|
Added mapping table for Crossref
|
2020-06-08 19:33:47 +02:00 |
Alessia Bardi
|
9fd25887f7
|
Result identifiers all start with 50|
|
2020-06-08 19:32:24 +02:00 |
Alessia Bardi
|
16cb073b15
|
set the instance datepfacceptance with the Crossref createdDate in case the issuedDate is blank
|
2020-06-08 19:06:03 +02:00 |
Michele Artini
|
bb659d870c
|
join simrels
|
2020-06-08 16:29:01 +02:00 |
Michele Artini
|
81e85465d8
|
join simrels
|
2020-06-08 16:26:16 +02:00 |
Claudio Atzori
|
3d871c6651
|
Merge branch 'master' into graph_cleaning
|
2020-06-08 15:23:24 +02:00 |
Claudio Atzori
|
25a093b1a4
|
integrated changes from master
|
2020-06-08 15:04:00 +02:00 |
Sandro La Bruzzo
|
e34e7d6728
|
merge DOIBoost
|
2020-06-08 08:32:22 +02:00 |
Sandro La Bruzzo
|
e46e2a4776
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-06-08 08:17:14 +02:00 |
Spyros Zoupanos
|
3576dd186b
|
Adding hive timeout as workflow parameter
|
2020-06-05 22:29:54 +03:00 |
Claudio Atzori
|
b2349659cf
|
WIP: graph property fixing implementation
|
2020-06-05 18:37:38 +02:00 |
Michele Artini
|
a73973a74b
|
partial implemantation of broker events generation
|
2020-06-05 11:43:00 +02:00 |
Michele Artini
|
7e82996e7c
|
partial implemantation of broker events generation
|
2020-06-04 17:10:43 +02:00 |
Sandro La Bruzzo
|
b57e8ba374
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-06-04 14:39:41 +02:00 |
Sandro La Bruzzo
|
7ac1ba2e35
|
improvement DOIBoost
|
2020-06-04 14:39:20 +02:00 |
Michele Artini
|
97177d7f7b
|
partial refactoring
|
2020-06-04 10:26:34 +02:00 |
Sandro La Bruzzo
|
13815d5d13
|
improvement DOIBoost
|
2020-06-01 17:52:12 +02:00 |
Claudio Atzori
|
05f269a1c0
|
kryo based parallel implementation of CreateRelatedEntitiesJob_phase2, now works by OafType; introduced custom aggregator in AdjacencyListBuilderJob
|
2020-06-01 00:32:42 +02:00 |
Claudio Atzori
|
5e23fb3a74
|
code formatting
|
2020-05-30 10:52:56 +02:00 |
Claudio Atzori
|
54ca8ed6c3
|
uniformed param name (isLookupUrl), Vocab model classes defined as Serializable
|
2020-05-29 18:17:30 +02:00 |
Claudio Atzori
|
1577bd5b8b
|
added IsLookupUrl to the raw_db workflow parameters
|
2020-05-29 16:18:16 +02:00 |
Claudio Atzori
|
91d78b825b
|
Merge pull request 'import from db using is vocabularies' (#17) from result_pids into master
Looks good, thanks Michele!
|
2020-05-29 16:02:40 +02:00 |
Michele Artini
|
adb798faa5
|
import from db using is vocabularies
|
2020-05-29 12:03:51 +02:00 |
Claudio Atzori
|
6f5f498c78
|
restored common properties driving executor-cores and executor-memory in join_organization_relations wf node
|
2020-05-29 11:22:00 +02:00 |
Claudio Atzori
|
b2f9564f13
|
WIP: fixed PrepareRelationsJob; parallel implementation of CreateRelatedEntitiesJob_phase2, now works by OafType; introduced custom aggregator in AdjacencyListBuilderJob
|
2020-05-29 10:58:15 +02:00 |
Miriam Baglioni
|
dfa4997a4f
|
removed commented code
|
2020-05-29 10:45:18 +02:00 |
Miriam Baglioni
|
6f1eea28b6
|
changed message in log
|
2020-05-29 10:41:39 +02:00 |
Sandro La Bruzzo
|
b87b3ddb6b
|
changed mapping ORCIDToOAF
|
2020-05-29 09:32:04 +02:00 |
Miriam Baglioni
|
8b6e886fb6
|
added new resource for testing
|
2020-05-28 23:54:31 +02:00 |
Miriam Baglioni
|
6989fb9c8a
|
changed the project test according to the newly introduced join with the db project codes
|
2020-05-28 23:53:24 +02:00 |
Miriam Baglioni
|
782984d8e5
|
added needed parameter
|
2020-05-28 23:52:41 +02:00 |
Miriam Baglioni
|
01f7876595
|
fix issue with flatMap - the return type must not be null
|
2020-05-28 23:50:32 +02:00 |
Claudio Atzori
|
a57965a3ea
|
limiting the dimensions of outliers
|
2020-05-28 17:36:37 +02:00 |
Miriam Baglioni
|
773735f870
|
added the path to the file containing the projects code from the db
|
2020-05-28 17:30:45 +02:00 |
Miriam Baglioni
|
6a15067a64
|
added one step in the workflow
|
2020-05-28 17:30:09 +02:00 |
Miriam Baglioni
|
5309a99a70
|
modified the PrepareProjects to consider those in the db
|
2020-05-28 17:29:53 +02:00 |
Miriam Baglioni
|
b737ed8236
|
added part to read projects from the openaire db to filter out those in the csv file that are not in the db
|
2020-05-28 17:29:21 +02:00 |
Claudio Atzori
|
821be1f8b6
|
experimental implementation of custom aggregation using kryo encoders
|
2020-05-28 13:53:13 +02:00 |
Claudio Atzori
|
83504ecace
|
limiting the maximum number of authors allowed in XML records to MAX_AUTHORS = 200; authors with ORCID can exceed that limit
|
2020-05-28 13:52:30 +02:00 |
Claudio Atzori
|
ef11593068
|
JoinedEntity.links defined as empty list by default
|
2020-05-28 13:50:44 +02:00 |
Claudio Atzori
|
5dea155a87
|
increased number of partitions produced by the join_all_entities phase as well as spark.sql.shuffle.partitions in adjancency_lists phase
|
2020-05-28 13:49:59 +02:00 |
Miriam Baglioni
|
35b7279147
|
changed test because data are saved as SequenceFile now, and because of the group by the umber of produced update decrease
|
2020-05-28 10:26:12 +02:00 |
Miriam Baglioni
|
37c155b86a
|
merge branch with fork master
|
2020-05-28 10:09:51 +02:00 |
Miriam Baglioni
|
df44db686a
|
refactoring
|
2020-05-28 10:07:00 +02:00 |
Miriam Baglioni
|
87b07f4af8
|
removed unused variables
|
2020-05-28 10:05:43 +02:00 |
Miriam Baglioni
|
1060977272
|
added fs actions to remove and the create the workingDir
|
2020-05-28 10:04:36 +02:00 |
Miriam Baglioni
|
96d1a3c431
|
deleted the file were to store the csv files
|
2020-05-28 10:04:10 +02:00 |
Miriam Baglioni
|
669c05c771
|
added groupBy before creating Actions
|
2020-05-28 10:00:45 +02:00 |
Sandro La Bruzzo
|
02f90eeb07
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-28 09:58:32 +02:00 |
Sandro La Bruzzo
|
7d29b61c62
|
code refactor
|
2020-05-28 09:57:46 +02:00 |
Claudio Atzori
|
fdd54bad1c
|
code formatting
|
2020-05-27 19:31:54 +02:00 |
Miriam Baglioni
|
1855453434
|
changed the outputdir of the last step
|
2020-05-27 17:59:36 +02:00 |
Claudio Atzori
|
b9b1bc9967
|
Merge branch 'master' into provision_indexing
|
2020-05-27 12:55:20 +02:00 |
Claudio Atzori
|
aac1515b58
|
Merge pull request 'result_pids without conflicts ???' (#16) from result_pids into master
Looks good, thanks Michele
|
2020-05-27 12:54:52 +02:00 |
Michele Artini
|
f5ce7d76e1
|
resolve conflicts
|
2020-05-27 12:49:17 +02:00 |
Claudio Atzori
|
cfd753217c
|
repartition the join_entities in 24k files
|
2020-05-27 12:44:01 +02:00 |
Claudio Atzori
|
2f1a623d09
|
sync from master branch
|
2020-05-27 12:39:58 +02:00 |
Claudio Atzori
|
9e4ec1543b
|
updated test
|
2020-05-27 12:38:42 +02:00 |
Claudio Atzori
|
8047d16dd9
|
added RDD based adjacency list creation procedure
|
2020-05-27 12:38:12 +02:00 |
Claudio Atzori
|
f057dcdf65
|
limit the max number of externalreferences to MAX_EXTERNAL_ENTITIES
|
2020-05-27 12:37:33 +02:00 |
Michele Artini
|
b81f2741d2
|
xquery
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
a25598140a
|
result pids (new xpaths + IS vocabularies)
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
7a7272d9ec
|
result pids (new xpaths + IS vocabularies)
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
3ceb2d2853
|
match terms with vocabularies
|
2020-05-27 11:34:13 +02:00 |
Claudio Atzori
|
4e36d689dd
|
fixed XML serialization for children sub-elements (duplicates & externalreferences)
|
2020-05-26 18:30:40 +02:00 |
Miriam Baglioni
|
92e3a52e91
|
merge branch with fork master
|
2020-05-26 15:57:51 +02:00 |
Michele Artini
|
c15d997925
|
xquery
|
2020-05-26 13:13:17 +02:00 |
Michele Artini
|
c6af36496a
|
result pids (new xpaths + IS vocabularies)
|
2020-05-26 13:11:09 +02:00 |
Michele Artini
|
093f1aff03
|
result pids (new xpaths + IS vocabularies)
|
2020-05-26 13:06:55 +02:00 |
Claudio Atzori
|
b8e541a454
|
fixing repeated organization.websiteurl in organization entities (#5645) as well as project.ecinternationalorganizationeurinterests
|
2020-05-26 10:30:09 +02:00 |
Claudio Atzori
|
55595d7235
|
HACK: patch NULL values with defaults found in result.datainfo.deletedbyinference and result.context
|
2020-05-26 10:28:35 +02:00 |
Claudio Atzori
|
7b288a94cb
|
code formatting
|
2020-05-26 09:54:13 +02:00 |
Miriam Baglioni
|
54d869e618
|
merge upstream
|
2020-05-26 09:22:04 +02:00 |
Miriam Baglioni
|
eea07f4c42
|
refactoring
|
2020-05-26 09:21:49 +02:00 |
Sandro La Bruzzo
|
79c26382da
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-26 09:15:50 +02:00 |
Sandro La Bruzzo
|
25f52e19a4
|
implemented generation of ActionSet
|
2020-05-26 09:15:33 +02:00 |
Michele Artini
|
d6aada4957
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-26 08:44:31 +02:00 |
Michele Artini
|
b1546605e3
|
updated version of a dependency
|
2020-05-26 08:44:15 +02:00 |
Claudio Atzori
|
7582532e73
|
[maven-release-plugin] prepare for next development iteration
|
2020-05-25 19:48:18 +02:00 |
Claudio Atzori
|
01c2e93395
|
[maven-release-plugin] prepare release dhp-1.2.1
|
2020-05-25 19:48:14 +02:00 |
miconis
|
da1e5cf557
|
implementation of the result title merge. main title with higher trust, distinct between the others
|
2020-05-25 18:02:57 +02:00 |
Miriam Baglioni
|
d3d36647d2
|
merge upstream
|
2020-05-25 10:38:22 +02:00 |
Miriam Baglioni
|
74215f6d9f
|
refactoring
|
2020-05-25 10:38:16 +02:00 |
Miriam Baglioni
|
dbde2d243a
|
changed due to move of PacePerson from dhp-graph-mapper to dhp-common
|
2020-05-25 10:35:39 +02:00 |
Miriam Baglioni
|
f754c424bd
|
changed logic to compute only onece PacePerson for each Author to be enriched
|
2020-05-25 10:35:02 +02:00 |
Miriam Baglioni
|
8f51af4e9b
|
added PacePerson to get name surname for authors having only fullname set
|
2020-05-25 10:34:30 +02:00 |
Miriam Baglioni
|
b258f99ece
|
fix for issue that duplicated result
|
2020-05-25 10:26:48 +02:00 |
Miriam Baglioni
|
8f6ce970f9
|
moved PacePerson to dhp-common to avoid conflict in dependency with graph-mapper
|
2020-05-25 10:25:55 +02:00 |
Claudio Atzori
|
de108f54d6
|
code formatting
|
2020-05-23 10:21:19 +02:00 |
Claudio Atzori
|
6b56cae57d
|
added mapping for bestaccessrights
|
2020-05-23 09:57:39 +02:00 |
Claudio Atzori
|
7181807e64
|
code formatting
|
2020-05-23 09:51:48 +02:00 |
Sandro La Bruzzo
|
2408083566
|
implemented filtering step
|
2020-05-23 08:46:49 +02:00 |
Sandro La Bruzzo
|
244f6e50cf
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-22 20:52:15 +02:00 |
Sandro La Bruzzo
|
147dd389bf
|
minor fix
|
2020-05-22 20:51:42 +02:00 |
Miriam Baglioni
|
0d1ec1913f
|
added fix to avoid duplication of results
|
2020-05-22 18:42:25 +02:00 |
miconis
|
5d7ac78c41
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-22 17:25:08 +02:00 |
miconis
|
0fd0c7d725
|
reimplementation of the sim between two authors. now it takes into account both name and surname. threshold incremented to 1.0 if the name is too short
|
2020-05-22 17:24:57 +02:00 |
Michele Artini
|
eb606dc1e2
|
partial implementation of events with rels
|
2020-05-22 17:17:41 +02:00 |
Miriam Baglioni
|
29066a6b46
|
applied code cleanup
|
2020-05-22 15:38:50 +02:00 |
Miriam Baglioni
|
8610ad5142
|
added groupby id to fix multiple result with same id at join step
|
2020-05-22 15:32:55 +02:00 |
Miriam Baglioni
|
1e44703e3e
|
merge upstream
|
2020-05-22 15:30:07 +02:00 |
Miriam Baglioni
|
ac8025f469
|
-
|
2020-05-22 15:29:41 +02:00 |
Miriam Baglioni
|
50ad83b97f
|
-
|
2020-05-22 15:27:19 +02:00 |
Miriam Baglioni
|
473c6d3a23
|
produces AtomicActions instead of Projects
|
2020-05-22 15:26:57 +02:00 |
Sandro La Bruzzo
|
72278b9375
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-22 15:17:13 +02:00 |
Sandro La Bruzzo
|
22936d0877
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-05-22 15:15:17 +02:00 |
Sandro La Bruzzo
|
9fbb221457
|
completed mapping of UnpayWall and ORCID
|
2020-05-22 15:15:09 +02:00 |
Miriam Baglioni
|
70389b0a30
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-22 13:53:23 +02:00 |
Miriam Baglioni
|
4308f31165
|
added fix to make test run
|
2020-05-22 13:13:01 +02:00 |
Claudio Atzori
|
946598cfba
|
Merge branch 'master' into provision_indexing
|
2020-05-22 12:35:41 +02:00 |
Claudio Atzori
|
3cf2796ac6
|
code formatting
|
2020-05-22 12:34:00 +02:00 |
Michele Artini
|
dc4621b3cb
|
filter ORCID e MAG identifiers
|
2020-05-22 12:25:01 +02:00 |
Michele Artini
|
9f2d0f1b08
|
filter ORCID e MAG identifiers
|
2020-05-22 11:00:27 +02:00 |
Michele Artini
|
9de71e54a8
|
filter ORCID e MAG identifiers
|
2020-05-22 10:47:39 +02:00 |
Michele Artini
|
c5f7e17348
|
author fullnames
|
2020-05-22 10:08:02 +02:00 |
Claudio Atzori
|
ad40470040
|
Merge branch 'master' into provision_indexing
|
2020-05-22 08:51:22 +02:00 |
Claudio Atzori
|
925d933204
|
making XmlRecordFactory immune to graph encoding changes (mostly to avoid NPEs)
|
2020-05-22 08:50:44 +02:00 |
Claudio Atzori
|
b33dd58be4
|
replaced parameter 'reuseRecords' with 'resumeFrom', allowing to restart the provision workflow execution from any step, useful for manual submissions or debugging
|
2020-05-22 08:50:06 +02:00 |
Michele Artini
|
c7ca3cf35b
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-21 16:48:20 +02:00 |
Michele Artini
|
3e34517479
|
partial implementation of events with rels
|
2020-05-21 16:47:53 +02:00 |
Miriam Baglioni
|
eae12a6586
|
Merge branch 'master' into dhp_oaf_model
|
2020-05-21 16:31:22 +02:00 |
Miriam Baglioni
|
6750075fbd
|
merge upstream
|
2020-05-21 16:31:09 +02:00 |
Miriam Baglioni
|
4589c428b1
|
generate action sets and saves them in the hdfs path for the actions sets
|
2020-05-21 16:30:39 +02:00 |
miconis
|
8b35e0e7f0
|
reimplementation of the author merging in deduprecord creation. implementation of the test class. minor changes
|
2020-05-21 12:02:44 +02:00 |
miconis
|
8bbd1d0501
|
reimplementation of the author merging in deduprecord creation. implementation of the test class.
|
2020-05-21 11:52:14 +02:00 |
Michele Artini
|
e43d4d7778
|
added a coalesce in sql query
|
2020-05-21 11:08:07 +02:00 |
Claudio Atzori
|
dbfb9c19fe
|
minor changes
|
2020-05-21 10:00:14 +02:00 |
Michele Artini
|
b3bcbb3129
|
resolve name of organization countries
|
2020-05-21 08:41:32 +02:00 |
Enrico Ottonello
|
1109d3b3fc
|
Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost
|
2020-05-21 00:41:27 +02:00 |
Enrico Ottonello
|
869a53040e
|
save to text file format
|
2020-05-21 00:41:21 +02:00 |
Sandro La Bruzzo
|
5818abaab4
|
fixed Crossref Mapping
|
2020-05-20 17:05:46 +02:00 |
Claudio Atzori
|
da4267d0fe
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-05-20 14:58:22 +02:00 |
Claudio Atzori
|
d7d2a0637f
|
added extra parameters to the provision indexing workflow
|
2020-05-20 14:55:38 +02:00 |
Miriam Baglioni
|
055eec5a77
|
added resource for prepare project test
|
2020-05-20 13:54:10 +02:00 |
Miriam Baglioni
|
9079bc1f61
|
-
|
2020-05-20 13:53:32 +02:00 |
Miriam Baglioni
|
67ba4fde57
|
added test for prepare projects step
|
2020-05-20 13:53:08 +02:00 |
Miriam Baglioni
|
5e0e554000
|
Merge branch 'master' into dhp_oaf_model
|
2020-05-20 10:57:30 +02:00 |
Miriam Baglioni
|
76f3f73caa
|
merge upstream
|
2020-05-20 10:31:40 +02:00 |
Miriam Baglioni
|
3c0eb12d3e
|
removed the not zipped files
|
2020-05-20 10:31:05 +02:00 |
Miriam Baglioni
|
c0d9e02340
|
zipped test resources that are too big
|
2020-05-20 10:30:25 +02:00 |
Miriam Baglioni
|
5e9c9fa87c
|
tests
|
2020-05-20 10:29:57 +02:00 |
Miriam Baglioni
|
faed7521bf
|
added resources for testing
|
2020-05-20 10:29:29 +02:00 |
Miriam Baglioni
|
75491482de
|
added a new preparation step to replicate each project for the programme it is associated to
|
2020-05-20 10:28:56 +02:00 |
Miriam Baglioni
|
eb0e47ba53
|
parameters for h2020 programme
|
2020-05-20 10:26:44 +02:00 |
Sandro La Bruzzo
|
b771d67e9d
|
next step of MAG conversion implemented
|
2020-05-20 08:14:03 +02:00 |
Miriam Baglioni
|
08218d2f3f
|
new workflow with added steps
|
2020-05-19 18:44:25 +02:00 |
Miriam Baglioni
|
457293ccc0
|
test for the variuos steps of project update with programme
|
2020-05-19 18:43:42 +02:00 |
Miriam Baglioni
|
9447d78ef3
|
added preparation classes
|
2020-05-19 18:42:50 +02:00 |
Michele Artini
|
85ca5622d4
|
partial implementation of generation of simple events
|
2020-05-19 16:17:35 +02:00 |
Claudio Atzori
|
0bdfbb0a57
|
reintroduced RDD based relation cut off procedure
|
2020-05-19 15:02:21 +02:00 |
Enrico Ottonello
|
934ad570e0
|
joined summaries and activities dataset
|
2020-05-19 12:57:21 +02:00 |
Enrico Ottonello
|
ca722d4d18
|
merged
|
2020-05-19 09:43:12 +02:00 |
Enrico Ottonello
|
7362bc3e9d
|
workflow to generate seq(doi,AuthorList)
|
2020-05-19 09:34:44 +02:00 |
Sandro La Bruzzo
|
8c95b50f26
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-19 09:25:04 +02:00 |
Sandro La Bruzzo
|
486e850bcc
|
next step of MAG conversion implemented
|
2020-05-19 09:24:45 +02:00 |
Enrico Ottonello
|
d4e9075f22
|
Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost
|
2020-05-18 19:51:36 +02:00 |
Enrico Ottonello
|
fc80e8c7de
|
added accumulator; last modified date of the record is added to saved data; lambda file is partitioned into 20 parts before starting downloading
|
2020-05-18 19:51:29 +02:00 |
Claudio Atzori
|
f3bc8aed31
|
lifted memory requirements for country propagation wf
|
2020-05-18 15:29:10 +02:00 |
Miriam Baglioni
|
b71fbb68b1
|
removed the removeOutputDir command from code. Reltions are written in Append. The erase of the output dir ment to remove all the relations computed in the prevoius steps
|
2020-05-18 13:57:20 +02:00 |
Miriam Baglioni
|
629af7cb79
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-18 13:07:36 +02:00 |
Miriam Baglioni
|
f0f14caf99
|
removed script files for shell actions not performed
|
2020-05-18 13:06:16 +02:00 |
Miriam Baglioni
|
23bbac7d7c
|
-
|
2020-05-18 13:05:03 +02:00 |
Miriam Baglioni
|
4f1ff7ba73
|
added dependency to org.apache.commons common-csv
|
2020-05-18 13:04:39 +02:00 |
Miriam Baglioni
|
abc45f2708
|
added dnet-45 HttpConnector and related Classes, produced the POJO for projects and programme
|
2020-05-18 13:04:06 +02:00 |
Claudio Atzori
|
ef9a9a9f1a
|
remove the outout path when starting
|
2020-05-15 22:34:19 +02:00 |
Enrico Ottonello
|
0b29bb7e3b
|
spark job to download orcid record modified after a fixed date
|
2020-05-15 19:49:26 +02:00 |
Miriam Baglioni
|
5a648016ef
|
parameters from the GetFile class
|
2020-05-15 18:18:50 +02:00 |
Miriam Baglioni
|
83c262a483
|
workflow to download the files
|
2020-05-15 18:18:31 +02:00 |
Miriam Baglioni
|
22cb9e0da7
|
simple code to get file from URL
|
2020-05-15 18:18:01 +02:00 |
Claudio Atzori
|
7838f2c63f
|
init the empty list for author pids mapped from OAF
|
2020-05-15 17:06:01 +02:00 |
Claudio Atzori
|
82b615ab33
|
NPE check
|
2020-05-15 16:04:46 +02:00 |
Miriam Baglioni
|
e26a67c3eb
|
merge with upstream
|
2020-05-15 15:53:05 +02:00 |
Claudio Atzori
|
7a89507ab1
|
code formatting
|
2020-05-15 15:16:54 +02:00 |
Miriam Baglioni
|
5ec8c49ad5
|
removed serialization points
|
2020-05-15 12:49:58 +02:00 |
Claudio Atzori
|
1d35836a58
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-05-15 12:26:31 +02:00 |
Claudio Atzori
|
cfc8948717
|
fixed mapping OdfToGraph: pick the correct element to map author pids and author affiliations; extended mapping Oaf2Graph: added support for author pids
|
2020-05-15 12:26:16 +02:00 |
Michele Artini
|
2a4e68a292
|
events recognition
|
2020-05-15 12:25:37 +02:00 |
Claudio Atzori
|
a832658296
|
code formatting
|
2020-05-15 10:21:09 +02:00 |
Claudio Atzori
|
50d6a2ad3c
|
added output directory removal in the blacklist spark actions; included common global properties in blacklist's workflow.xml
|
2020-05-15 09:53:37 +02:00 |
Claudio Atzori
|
18f46e47b9
|
added relations to the graph2hive import workflow
|
2020-05-15 09:34:48 +02:00 |
Claudio Atzori
|
9d028ffe1c
|
cleanup
|
2020-05-15 09:28:55 +02:00 |
Claudio Atzori
|
fd62359538
|
cleanup
|
2020-05-15 09:28:15 +02:00 |
Claudio Atzori
|
eb64335a54
|
parallel implementation for graph Hive importer
|
2020-05-15 09:05:26 +02:00 |
Miriam Baglioni
|
94571c9a51
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-14 18:29:55 +02:00 |
Miriam Baglioni
|
f25db01664
|
changed in the constant from propagationconstants to modelconstants
|
2020-05-14 18:29:24 +02:00 |
Miriam Baglioni
|
d05630d979
|
removed the constants added in ModelConstants
|
2020-05-14 18:22:50 +02:00 |
Claudio Atzori
|
f044d09315
|
revised mapping: more accurate mapping for name/surname from datacite format; improved mapping of null values
|
2020-05-14 15:07:24 +02:00 |
Miriam Baglioni
|
e7eb4f377e
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-14 10:34:17 +02:00 |
Miriam Baglioni
|
8828458acf
|
minor changes
|
2020-05-14 10:34:12 +02:00 |
Claudio Atzori
|
ab37953332
|
added global properties in wf definitions to avoid repeating name-node and job-tracker in the (many) distcp actions; reintroduced output directory removal at the beginning of each spark action
|
2020-05-14 10:25:41 +02:00 |
Claudio Atzori
|
12bfa6702e
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-05-13 17:01:17 +02:00 |
Claudio Atzori
|
5ecacad70a
|
fixed default resource typing in Oaf/Odf mapping
|
2020-05-13 17:01:11 +02:00 |
Enrico Ottonello
|
12756f9d41
|
multithread (4 threads) test to feed elastic search
|
2020-05-13 16:11:40 +02:00 |
Michele Artini
|
c0265213a0
|
partial implementation
|
2020-05-13 12:00:27 +02:00 |
Sandro La Bruzzo
|
a92ee0f41e
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-13 10:38:13 +02:00 |
Sandro La Bruzzo
|
d876f47d06
|
next step of MAG conversion implemented
|
2020-05-13 10:38:04 +02:00 |
Claudio Atzori
|
1ddd33de41
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-05-13 09:04:41 +02:00 |
Claudio Atzori
|
85f3c55992
|
fixed node names in blacklist workflow
|
2020-05-13 09:04:33 +02:00 |
Miriam Baglioni
|
43f127448d
|
changed the package name from dhp-propagation to dhp-enrichment for the preparation phase of funding propagation
|
2020-05-12 18:24:26 +02:00 |
Enrico Ottonello
|
08040cef80
|
spark action to analyze orcid lambda file
|
2020-05-12 16:57:43 +02:00 |
Claudio Atzori
|
ec0782e582
|
renamed jar containing the bulktagging and propagation workflows from dhp-[bulktagging|propagation] to dhp-enrichment; adjusted xml formatting
|
2020-05-12 15:49:28 +02:00 |
Miriam Baglioni
|
1547ca7e15
|
added blacklist step to the end of the provision wf
|
2020-05-12 12:17:27 +02:00 |
Miriam Baglioni
|
14979f299e
|
changed the configuration factory
|
2020-05-12 11:28:38 +02:00 |
Miriam Baglioni
|
f8aef6161a
|
minor modification
|
2020-05-12 11:28:07 +02:00 |
Miriam Baglioni
|
7387f3449a
|
changed the route to find the verb resolver classes
|
2020-05-12 11:27:38 +02:00 |
Miriam Baglioni
|
7687519f00
|
merged conflicts with upstream branch
|
2020-05-12 10:03:44 +02:00 |
Miriam Baglioni
|
8ffc050b8a
|
fixed problem in communityconfigurationfactory test
|
2020-05-12 10:01:09 +02:00 |
Claudio Atzori
|
527e8169a8
|
adjusted paths pointing to test configurations, cleanup
|
2020-05-11 18:17:05 +02:00 |
Claudio Atzori
|
f9a62ba63b
|
added wf nodes to copy entities to the output path
|
2020-05-11 18:16:39 +02:00 |
Miriam Baglioni
|
ad63effb4e
|
removed deletion of working dir
|
2020-05-11 17:48:22 +02:00 |
Claudio Atzori
|
c6b028f2af
|
code formatting
|
2020-05-11 17:38:08 +02:00 |
Claudio Atzori
|
6d0b11252e
|
bulktagging wfs moved into common dhp-enrichment module
|
2020-05-11 17:32:06 +02:00 |
Miriam Baglioni
|
50659011eb
|
refactoring
|
2020-05-11 16:14:26 +02:00 |
Miriam Baglioni
|
e883daf87e
|
added the outputPath parameter and the reset path to remove the outputath directory
|
2020-05-11 16:10:24 +02:00 |
Miriam Baglioni
|
5ab3424c77
|
removed unused dependencies
|
2020-05-11 16:09:37 +02:00 |
Miriam Baglioni
|
6a3b081263
|
added the last step of blacklisteing
|
2020-05-11 16:09:20 +02:00 |
Enrico Ottonello
|
3b1a68cbf5
|
elastic search feed test
|
2020-05-11 14:53:52 +02:00 |
Enrico Ottonello
|
f53e42bda7
|
merged
|
2020-05-11 14:49:28 +02:00 |
Enrico Ottonello
|
7990894454
|
different date format in lambda file parsing
|
2020-05-11 14:41:11 +02:00 |
Sandro La Bruzzo
|
0c6774e4da
|
updated pom version
|
2020-05-11 14:35:14 +02:00 |