Claudio Atzori
|
25a093b1a4
|
integrated changes from master
|
2020-06-08 15:04:00 +02:00 |
Claudio Atzori
|
45973b5743
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-08 15:01:34 +02:00 |
Claudio Atzori
|
94533b71bc
|
added comments for model fields removal
|
2020-06-08 15:01:21 +02:00 |
Sandro La Bruzzo
|
e34e7d6728
|
merge DOIBoost
|
2020-06-08 08:32:22 +02:00 |
Sandro La Bruzzo
|
e46e2a4776
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-06-08 08:17:14 +02:00 |
Claudio Atzori
|
94ea8f82a7
|
Merge pull request 'Adding hive timeout as workflow parameter' (#18) from spyros/dnet-hadoop:master into master
|
2020-06-07 21:32:15 +02:00 |
Spyros Zoupanos
|
3576dd186b
|
Adding hive timeout as workflow parameter
|
2020-06-05 22:29:54 +03:00 |
Michele Artini
|
a73973a74b
|
partial implemantation of broker events generation
|
2020-06-05 11:43:00 +02:00 |
Michele Artini
|
7e82996e7c
|
partial implemantation of broker events generation
|
2020-06-04 17:10:43 +02:00 |
Sandro La Bruzzo
|
b57e8ba374
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-06-04 14:39:41 +02:00 |
Sandro La Bruzzo
|
7ac1ba2e35
|
improvement DOIBoost
|
2020-06-04 14:39:20 +02:00 |
Michele Artini
|
97177d7f7b
|
partial refactoring
|
2020-06-04 10:26:34 +02:00 |
Sandro La Bruzzo
|
13815d5d13
|
improvement DOIBoost
|
2020-06-01 17:52:12 +02:00 |
Claudio Atzori
|
05f269a1c0
|
kryo based parallel implementation of CreateRelatedEntitiesJob_phase2, now works by OafType; introduced custom aggregator in AdjacencyListBuilderJob
|
2020-06-01 00:32:42 +02:00 |
Claudio Atzori
|
5e23fb3a74
|
code formatting
|
2020-05-30 10:52:56 +02:00 |
Claudio Atzori
|
54ca8ed6c3
|
uniformed param name (isLookupUrl), Vocab model classes defined as Serializable
|
2020-05-29 18:17:30 +02:00 |
Claudio Atzori
|
1577bd5b8b
|
added IsLookupUrl to the raw_db workflow parameters
|
2020-05-29 16:18:16 +02:00 |
Claudio Atzori
|
91d78b825b
|
Merge pull request 'import from db using is vocabularies' (#17) from result_pids into master
Looks good, thanks Michele!
|
2020-05-29 16:02:40 +02:00 |
Michele Artini
|
adb798faa5
|
import from db using is vocabularies
|
2020-05-29 12:03:51 +02:00 |
Claudio Atzori
|
6f5f498c78
|
restored common properties driving executor-cores and executor-memory in join_organization_relations wf node
|
2020-05-29 11:22:00 +02:00 |
Claudio Atzori
|
b2f9564f13
|
WIP: fixed PrepareRelationsJob; parallel implementation of CreateRelatedEntitiesJob_phase2, now works by OafType; introduced custom aggregator in AdjacencyListBuilderJob
|
2020-05-29 10:58:15 +02:00 |
Sandro La Bruzzo
|
b87b3ddb6b
|
changed mapping ORCIDToOAF
|
2020-05-29 09:32:04 +02:00 |
Claudio Atzori
|
a57965a3ea
|
limiting the dimensions of outliers
|
2020-05-28 17:36:37 +02:00 |
Claudio Atzori
|
821be1f8b6
|
experimental implementation of custom aggregation using kryo encoders
|
2020-05-28 13:53:13 +02:00 |
Claudio Atzori
|
83504ecace
|
limiting the maximum number of authors allowed in XML records to MAX_AUTHORS = 200; authors with ORCID can exceed that limit
|
2020-05-28 13:52:30 +02:00 |
Claudio Atzori
|
ef11593068
|
JoinedEntity.links defined as empty list by default
|
2020-05-28 13:50:44 +02:00 |
Claudio Atzori
|
5dea155a87
|
increased number of partitions produced by the join_all_entities phase as well as spark.sql.shuffle.partitions in adjancency_lists phase
|
2020-05-28 13:49:59 +02:00 |
Sandro La Bruzzo
|
02f90eeb07
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-28 09:58:32 +02:00 |
Sandro La Bruzzo
|
7d29b61c62
|
code refactor
|
2020-05-28 09:57:46 +02:00 |
Claudio Atzori
|
fdd54bad1c
|
code formatting
|
2020-05-27 19:31:54 +02:00 |
Claudio Atzori
|
b9b1bc9967
|
Merge branch 'master' into provision_indexing
|
2020-05-27 12:55:20 +02:00 |
Claudio Atzori
|
aac1515b58
|
Merge pull request 'result_pids without conflicts ???' (#16) from result_pids into master
Looks good, thanks Michele
|
2020-05-27 12:54:52 +02:00 |
Michele Artini
|
f5ce7d76e1
|
resolve conflicts
|
2020-05-27 12:49:17 +02:00 |
Claudio Atzori
|
cfd753217c
|
repartition the join_entities in 24k files
|
2020-05-27 12:44:01 +02:00 |
Claudio Atzori
|
2f1a623d09
|
sync from master branch
|
2020-05-27 12:39:58 +02:00 |
Claudio Atzori
|
9e4ec1543b
|
updated test
|
2020-05-27 12:38:42 +02:00 |
Claudio Atzori
|
8047d16dd9
|
added RDD based adjacency list creation procedure
|
2020-05-27 12:38:12 +02:00 |
Claudio Atzori
|
f057dcdf65
|
limit the max number of externalreferences to MAX_EXTERNAL_ENTITIES
|
2020-05-27 12:37:33 +02:00 |
Michele Artini
|
b81f2741d2
|
xquery
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
a25598140a
|
result pids (new xpaths + IS vocabularies)
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
7a7272d9ec
|
result pids (new xpaths + IS vocabularies)
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
3ceb2d2853
|
match terms with vocabularies
|
2020-05-27 11:34:13 +02:00 |
Claudio Atzori
|
4e36d689dd
|
fixed XML serialization for children sub-elements (duplicates & externalreferences)
|
2020-05-26 18:30:40 +02:00 |
Michele Artini
|
c15d997925
|
xquery
|
2020-05-26 13:13:17 +02:00 |
Michele Artini
|
c6af36496a
|
result pids (new xpaths + IS vocabularies)
|
2020-05-26 13:11:09 +02:00 |
Michele Artini
|
093f1aff03
|
result pids (new xpaths + IS vocabularies)
|
2020-05-26 13:06:55 +02:00 |
Claudio Atzori
|
b8e541a454
|
fixing repeated organization.websiteurl in organization entities (#5645) as well as project.ecinternationalorganizationeurinterests
|
2020-05-26 10:30:09 +02:00 |
Claudio Atzori
|
55595d7235
|
HACK: patch NULL values with defaults found in result.datainfo.deletedbyinference and result.context
|
2020-05-26 10:28:35 +02:00 |
Claudio Atzori
|
7b288a94cb
|
code formatting
|
2020-05-26 09:54:13 +02:00 |
Claudio Atzori
|
e87eca9300
|
Merge pull request 'master' (#13) from miriam.baglioni/dnet-hadoop:master into enrichment_wfs
|
2020-05-26 09:34:23 +02:00 |