Claudio Atzori
|
7b6f0c8756
|
reading graph dump as text files, encoded as newline-delimited JSON records, as indicated in the wiki
|
2020-03-10 17:19:17 +01:00 |
Claudio Atzori
|
60aedb1110
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-10 17:09:44 +01:00 |
Claudio Atzori
|
a3f184fd3f
|
added field websiteurl in related organizations
|
2020-03-10 17:08:58 +01:00 |
Claudio Atzori
|
0e95544495
|
fixed serialization for datasource subjects
|
2020-03-10 17:07:44 +01:00 |
Michele Artini
|
b6efa9d6ab
|
Configuration of the SequenceFile Writer
|
2020-03-05 15:49:14 +01:00 |
Claudio Atzori
|
ccb153de78
|
updated image
|
2020-03-05 15:11:42 +01:00 |
Claudio Atzori
|
5e342a555c
|
no need to compute the inverse relClass, fixed text() in xpath expressions
|
2020-03-05 12:51:48 +01:00 |
Claudio Atzori
|
6ec04d4e02
|
specified column used to perform the join operation in the javadoc
|
2020-03-05 12:50:38 +01:00 |
Claudio Atzori
|
960619de98
|
updated image
|
2020-03-04 16:51:55 +01:00 |
Claudio Atzori
|
e89aa52e58
|
updated image
|
2020-03-04 16:18:49 +01:00 |
Claudio Atzori
|
5474e8ac9f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-04 14:54:46 +01:00 |
Claudio Atzori
|
d7137e566e
|
added dhp-doc-resources, aimed to include all the documentation resources used in the wiki pages
|
2020-03-04 14:54:41 +01:00 |
Michele Artini
|
7a2a466161
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-04 14:50:59 +01:00 |
Michele Artini
|
755eade2fb
|
fix creation ids
|
2020-03-04 14:49:45 +01:00 |
Claudio Atzori
|
6379f32466
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-04 10:57:06 +01:00 |
Claudio Atzori
|
0233987603
|
introduced post processing step following the hive DB creation/population
|
2020-03-04 10:56:50 +01:00 |
Claudio Atzori
|
1e563bc15e
|
introduced distinct properties driving the resouce usage for the XML record creation and the indexing phase
|
2020-03-04 10:55:11 +01:00 |
Claudio Atzori
|
9af3e904be
|
close the SparkSession at the end
|
2020-03-04 10:53:31 +01:00 |
Michele Artini
|
086af63158
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-04 10:46:40 +01:00 |
Michele Artini
|
e7167b996a
|
logs and closeable
|
2020-03-04 10:46:36 +01:00 |
Claudio Atzori
|
25ceec29ab
|
code formatting
|
2020-03-04 10:44:24 +01:00 |
Claudio Atzori
|
63c00c5e88
|
fixed typo
|
2020-03-04 10:43:44 +01:00 |
Claudio Atzori
|
9cf5ce2e66
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-02 17:03:10 +01:00 |
Claudio Atzori
|
bc7cfd5975
|
indexing workflow WIP: fixed projects fundingtree xml conversion, prioritized links between results and projects when limiting them to 100 in the join procedure
|
2020-03-02 17:03:07 +01:00 |
Michele Artini
|
4b29a121b0
|
migration using spark in step2
|
2020-03-02 16:12:14 +01:00 |
Michele Artini
|
5445a57102
|
migration using spark in step2
|
2020-03-02 16:11:59 +01:00 |
Claudio Atzori
|
60bc2b1a20
|
drop the hive DB before populating it from scratch
|
2020-02-27 10:10:55 +01:00 |
Michele Artini
|
689908b2e9
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-02-25 16:00:51 +01:00 |
Michele Artini
|
93665773ea
|
Fixed a problem with JavaRDD Union
|
2020-02-25 15:59:21 +01:00 |
Claudio Atzori
|
6a73fd5da5
|
in order to reuse the same XmlRecordFactory across different tasks, the state of contexts must be one per record built
|
2020-02-21 09:17:19 +01:00 |
Michele Artini
|
4c94e74a84
|
Added a missing dependency
|
2020-02-20 11:43:32 +01:00 |
Michele Artini
|
d49cd2fdc6
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-02-20 11:21:54 +01:00 |
Claudio Atzori
|
d42dde52ba
|
implemented method to merge relations
|
2020-02-19 17:29:05 +01:00 |
Claudio Atzori
|
5e5e32cb48
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-02-19 16:56:52 +01:00 |
Claudio Atzori
|
33185fd0b7
|
ISLookupClientFactory moved in dhp-common
|
2020-02-19 16:56:38 +01:00 |
Michele Artini
|
5d3739b5cf
|
migration of claims
|
2020-02-19 15:11:17 +01:00 |
Michele Artini
|
173f1df1e5
|
saved a query for openaire production database
|
2020-02-19 10:15:08 +01:00 |
Sandro La Bruzzo
|
9a2d74ac82
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-02-19 10:13:45 +01:00 |
Sandro La Bruzzo
|
e5d7cdf422
|
fixed sql query
|
2020-02-19 10:13:36 +01:00 |
Claudio Atzori
|
ed76521d9b
|
removed stale test resources, will be re-added later on
|
2020-02-18 11:51:08 +01:00 |
Claudio Atzori
|
0f364605ff
|
removed stale tests, need to reimplemente them anyway
|
2020-02-18 11:48:19 +01:00 |
Claudio Atzori
|
6a288625e5
|
fixed workflow outgoing node
|
2020-02-17 15:04:33 +01:00 |
Claudio Atzori
|
1b18fd4d54
|
sync with master branch
|
2020-02-17 13:49:46 +01:00 |
Claudio Atzori
|
5bae30f399
|
adding readme for dhp-schema
|
2020-02-17 13:38:33 +01:00 |
Sandro La Bruzzo
|
4f04759738
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-02-17 12:31:58 +01:00 |
Sandro La Bruzzo
|
76ee85141a
|
added oozie job for DNET migration and implemented Spark job for extracting entities
|
2020-02-17 12:31:44 +01:00 |
Claudio Atzori
|
c460e2d281
|
Aggiornare 'dhp-workflows/docs/oozie-installer.markdown'
|
2020-02-17 11:54:48 +01:00 |
Sandro La Bruzzo
|
fe93c709f1
|
Merge branch 'master' of michele.artini/dnet-hadoop into master
|
2020-02-17 10:43:08 +01:00 |
Michele Artini
|
176c5606bd
|
aligned with origin/master, aligned model and mapping
|
2020-02-17 10:40:53 +01:00 |
Claudio Atzori
|
56d1810a66
|
working procedure for records indexing using Spark, via lib com.lucidworks.spark:spark-solr
|
2020-02-14 12:28:52 +01:00 |