Claudio Atzori
|
83504ecace
|
limiting the maximum number of authors allowed in XML records to MAX_AUTHORS = 200; authors with ORCID can exceed that limit
|
2020-05-28 13:52:30 +02:00 |
Claudio Atzori
|
ef11593068
|
JoinedEntity.links defined as empty list by default
|
2020-05-28 13:50:44 +02:00 |
Claudio Atzori
|
5dea155a87
|
increased number of partitions produced by the join_all_entities phase as well as spark.sql.shuffle.partitions in adjancency_lists phase
|
2020-05-28 13:49:59 +02:00 |
Claudio Atzori
|
fdd54bad1c
|
code formatting
|
2020-05-27 19:31:54 +02:00 |
Claudio Atzori
|
b9b1bc9967
|
Merge branch 'master' into provision_indexing
|
2020-05-27 12:55:20 +02:00 |
Claudio Atzori
|
aac1515b58
|
Merge pull request 'result_pids without conflicts ???' (#16) from result_pids into master
Looks good, thanks Michele
|
2020-05-27 12:54:52 +02:00 |
Michele Artini
|
f5ce7d76e1
|
resolve conflicts
|
2020-05-27 12:49:17 +02:00 |
Claudio Atzori
|
cfd753217c
|
repartition the join_entities in 24k files
|
2020-05-27 12:44:01 +02:00 |
Claudio Atzori
|
2f1a623d09
|
sync from master branch
|
2020-05-27 12:39:58 +02:00 |
Claudio Atzori
|
9e4ec1543b
|
updated test
|
2020-05-27 12:38:42 +02:00 |
Claudio Atzori
|
8047d16dd9
|
added RDD based adjacency list creation procedure
|
2020-05-27 12:38:12 +02:00 |
Claudio Atzori
|
f057dcdf65
|
limit the max number of externalreferences to MAX_EXTERNAL_ENTITIES
|
2020-05-27 12:37:33 +02:00 |
Michele Artini
|
b81f2741d2
|
xquery
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
a25598140a
|
result pids (new xpaths + IS vocabularies)
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
7a7272d9ec
|
result pids (new xpaths + IS vocabularies)
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
3ceb2d2853
|
match terms with vocabularies
|
2020-05-27 11:34:13 +02:00 |
Claudio Atzori
|
4e36d689dd
|
fixed XML serialization for children sub-elements (duplicates & externalreferences)
|
2020-05-26 18:30:40 +02:00 |
Michele Artini
|
c15d997925
|
xquery
|
2020-05-26 13:13:17 +02:00 |
Michele Artini
|
c6af36496a
|
result pids (new xpaths + IS vocabularies)
|
2020-05-26 13:11:09 +02:00 |
Michele Artini
|
093f1aff03
|
result pids (new xpaths + IS vocabularies)
|
2020-05-26 13:06:55 +02:00 |
Claudio Atzori
|
b8e541a454
|
fixing repeated organization.websiteurl in organization entities (#5645) as well as project.ecinternationalorganizationeurinterests
|
2020-05-26 10:30:09 +02:00 |
Claudio Atzori
|
55595d7235
|
HACK: patch NULL values with defaults found in result.datainfo.deletedbyinference and result.context
|
2020-05-26 10:28:35 +02:00 |
Claudio Atzori
|
7b288a94cb
|
code formatting
|
2020-05-26 09:54:13 +02:00 |
Claudio Atzori
|
e87eca9300
|
Merge pull request 'master' (#13) from miriam.baglioni/dnet-hadoop:master into enrichment_wfs
|
2020-05-26 09:34:23 +02:00 |
Miriam Baglioni
|
54d869e618
|
merge upstream
|
2020-05-26 09:22:04 +02:00 |
Miriam Baglioni
|
eea07f4c42
|
refactoring
|
2020-05-26 09:21:49 +02:00 |
Michele Artini
|
d6aada4957
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-26 08:44:31 +02:00 |
Michele Artini
|
b1546605e3
|
updated version of a dependency
|
2020-05-26 08:44:15 +02:00 |
Claudio Atzori
|
7582532e73
|
[maven-release-plugin] prepare for next development iteration
|
2020-05-25 19:48:18 +02:00 |
Claudio Atzori
|
01c2e93395
|
[maven-release-plugin] prepare release dhp-1.2.1
|
2020-05-25 19:48:14 +02:00 |
Claudio Atzori
|
ae04234472
|
DataInfo.deletedbyinference is false by default
|
2020-05-25 19:32:48 +02:00 |
miconis
|
da1e5cf557
|
implementation of the result title merge. main title with higher trust, distinct between the others
|
2020-05-25 18:02:57 +02:00 |
Miriam Baglioni
|
d3d36647d2
|
merge upstream
|
2020-05-25 10:38:22 +02:00 |
Miriam Baglioni
|
74215f6d9f
|
refactoring
|
2020-05-25 10:38:16 +02:00 |
Miriam Baglioni
|
dbde2d243a
|
changed due to move of PacePerson from dhp-graph-mapper to dhp-common
|
2020-05-25 10:35:39 +02:00 |
Miriam Baglioni
|
f754c424bd
|
changed logic to compute only onece PacePerson for each Author to be enriched
|
2020-05-25 10:35:02 +02:00 |
Miriam Baglioni
|
8f51af4e9b
|
added PacePerson to get name surname for authors having only fullname set
|
2020-05-25 10:34:30 +02:00 |
Miriam Baglioni
|
b258f99ece
|
fix for issue that duplicated result
|
2020-05-25 10:26:48 +02:00 |
Miriam Baglioni
|
8f6ce970f9
|
moved PacePerson to dhp-common to avoid conflict in dependency with graph-mapper
|
2020-05-25 10:25:55 +02:00 |
Claudio Atzori
|
4b34872b44
|
using Objects.equals to check Field<T> equivalence
|
2020-05-25 10:14:15 +02:00 |
Claudio Atzori
|
0ab0206b4d
|
removed null objects from flattened Field<T> in mergeLists
|
2020-05-25 10:11:41 +02:00 |
Claudio Atzori
|
de108f54d6
|
code formatting
|
2020-05-23 10:21:19 +02:00 |
Claudio Atzori
|
6b56cae57d
|
added mapping for bestaccessrights
|
2020-05-23 09:57:39 +02:00 |
Claudio Atzori
|
7181807e64
|
code formatting
|
2020-05-23 09:51:48 +02:00 |
Miriam Baglioni
|
0d1ec1913f
|
added fix to avoid duplication of results
|
2020-05-22 18:42:25 +02:00 |
miconis
|
5d7ac78c41
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-22 17:25:08 +02:00 |
miconis
|
0fd0c7d725
|
reimplementation of the sim between two authors. now it takes into account both name and surname. threshold incremented to 1.0 if the name is too short
|
2020-05-22 17:24:57 +02:00 |
Michele Artini
|
eb606dc1e2
|
partial implementation of events with rels
|
2020-05-22 17:17:41 +02:00 |
Miriam Baglioni
|
29066a6b46
|
applied code cleanup
|
2020-05-22 15:38:50 +02:00 |
Miriam Baglioni
|
8610ad5142
|
added groupby id to fix multiple result with same id at join step
|
2020-05-22 15:32:55 +02:00 |