Claudio Atzori
|
83504ecace
|
limiting the maximum number of authors allowed in XML records to MAX_AUTHORS = 200; authors with ORCID can exceed that limit
|
2020-05-28 13:52:30 +02:00 |
Claudio Atzori
|
ef11593068
|
JoinedEntity.links defined as empty list by default
|
2020-05-28 13:50:44 +02:00 |
Claudio Atzori
|
5dea155a87
|
increased number of partitions produced by the join_all_entities phase as well as spark.sql.shuffle.partitions in adjancency_lists phase
|
2020-05-28 13:49:59 +02:00 |
Sandro La Bruzzo
|
02f90eeb07
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-28 09:58:32 +02:00 |
Sandro La Bruzzo
|
7d29b61c62
|
code refactor
|
2020-05-28 09:57:46 +02:00 |
Claudio Atzori
|
fdd54bad1c
|
code formatting
|
2020-05-27 19:31:54 +02:00 |
Claudio Atzori
|
b9b1bc9967
|
Merge branch 'master' into provision_indexing
|
2020-05-27 12:55:20 +02:00 |
Claudio Atzori
|
aac1515b58
|
Merge pull request 'result_pids without conflicts ???' (#16) from result_pids into master
Looks good, thanks Michele
|
2020-05-27 12:54:52 +02:00 |
Michele Artini
|
f5ce7d76e1
|
resolve conflicts
|
2020-05-27 12:49:17 +02:00 |
Claudio Atzori
|
cfd753217c
|
repartition the join_entities in 24k files
|
2020-05-27 12:44:01 +02:00 |
Claudio Atzori
|
2f1a623d09
|
sync from master branch
|
2020-05-27 12:39:58 +02:00 |
Claudio Atzori
|
9e4ec1543b
|
updated test
|
2020-05-27 12:38:42 +02:00 |
Claudio Atzori
|
8047d16dd9
|
added RDD based adjacency list creation procedure
|
2020-05-27 12:38:12 +02:00 |
Claudio Atzori
|
f057dcdf65
|
limit the max number of externalreferences to MAX_EXTERNAL_ENTITIES
|
2020-05-27 12:37:33 +02:00 |
Michele Artini
|
b81f2741d2
|
xquery
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
a25598140a
|
result pids (new xpaths + IS vocabularies)
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
7a7272d9ec
|
result pids (new xpaths + IS vocabularies)
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
3ceb2d2853
|
match terms with vocabularies
|
2020-05-27 11:34:13 +02:00 |
Claudio Atzori
|
4e36d689dd
|
fixed XML serialization for children sub-elements (duplicates & externalreferences)
|
2020-05-26 18:30:40 +02:00 |
Michele Artini
|
c15d997925
|
xquery
|
2020-05-26 13:13:17 +02:00 |
Michele Artini
|
c6af36496a
|
result pids (new xpaths + IS vocabularies)
|
2020-05-26 13:11:09 +02:00 |
Michele Artini
|
093f1aff03
|
result pids (new xpaths + IS vocabularies)
|
2020-05-26 13:06:55 +02:00 |
Claudio Atzori
|
b8e541a454
|
fixing repeated organization.websiteurl in organization entities (#5645) as well as project.ecinternationalorganizationeurinterests
|
2020-05-26 10:30:09 +02:00 |
Claudio Atzori
|
55595d7235
|
HACK: patch NULL values with defaults found in result.datainfo.deletedbyinference and result.context
|
2020-05-26 10:28:35 +02:00 |
Claudio Atzori
|
7b288a94cb
|
code formatting
|
2020-05-26 09:54:13 +02:00 |
Miriam Baglioni
|
54d869e618
|
merge upstream
|
2020-05-26 09:22:04 +02:00 |
Miriam Baglioni
|
eea07f4c42
|
refactoring
|
2020-05-26 09:21:49 +02:00 |
Sandro La Bruzzo
|
79c26382da
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-26 09:15:50 +02:00 |
Sandro La Bruzzo
|
25f52e19a4
|
implemented generation of ActionSet
|
2020-05-26 09:15:33 +02:00 |
Michele Artini
|
d6aada4957
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-26 08:44:31 +02:00 |
Michele Artini
|
b1546605e3
|
updated version of a dependency
|
2020-05-26 08:44:15 +02:00 |
Claudio Atzori
|
7582532e73
|
[maven-release-plugin] prepare for next development iteration
|
2020-05-25 19:48:18 +02:00 |
Claudio Atzori
|
01c2e93395
|
[maven-release-plugin] prepare release dhp-1.2.1
|
2020-05-25 19:48:14 +02:00 |
miconis
|
da1e5cf557
|
implementation of the result title merge. main title with higher trust, distinct between the others
|
2020-05-25 18:02:57 +02:00 |
Miriam Baglioni
|
d3d36647d2
|
merge upstream
|
2020-05-25 10:38:22 +02:00 |
Miriam Baglioni
|
74215f6d9f
|
refactoring
|
2020-05-25 10:38:16 +02:00 |
Miriam Baglioni
|
dbde2d243a
|
changed due to move of PacePerson from dhp-graph-mapper to dhp-common
|
2020-05-25 10:35:39 +02:00 |
Miriam Baglioni
|
f754c424bd
|
changed logic to compute only onece PacePerson for each Author to be enriched
|
2020-05-25 10:35:02 +02:00 |
Miriam Baglioni
|
8f51af4e9b
|
added PacePerson to get name surname for authors having only fullname set
|
2020-05-25 10:34:30 +02:00 |
Miriam Baglioni
|
b258f99ece
|
fix for issue that duplicated result
|
2020-05-25 10:26:48 +02:00 |
Miriam Baglioni
|
8f6ce970f9
|
moved PacePerson to dhp-common to avoid conflict in dependency with graph-mapper
|
2020-05-25 10:25:55 +02:00 |
Claudio Atzori
|
de108f54d6
|
code formatting
|
2020-05-23 10:21:19 +02:00 |
Claudio Atzori
|
6b56cae57d
|
added mapping for bestaccessrights
|
2020-05-23 09:57:39 +02:00 |
Claudio Atzori
|
7181807e64
|
code formatting
|
2020-05-23 09:51:48 +02:00 |
Sandro La Bruzzo
|
2408083566
|
implemented filtering step
|
2020-05-23 08:46:49 +02:00 |
Sandro La Bruzzo
|
244f6e50cf
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-22 20:52:15 +02:00 |
Sandro La Bruzzo
|
147dd389bf
|
minor fix
|
2020-05-22 20:51:42 +02:00 |
Miriam Baglioni
|
0d1ec1913f
|
added fix to avoid duplication of results
|
2020-05-22 18:42:25 +02:00 |
miconis
|
5d7ac78c41
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-22 17:25:08 +02:00 |
miconis
|
0fd0c7d725
|
reimplementation of the sim between two authors. now it takes into account both name and surname. threshold incremented to 1.0 if the name is too short
|
2020-05-22 17:24:57 +02:00 |
Michele Artini
|
eb606dc1e2
|
partial implementation of events with rels
|
2020-05-22 17:17:41 +02:00 |
Miriam Baglioni
|
29066a6b46
|
applied code cleanup
|
2020-05-22 15:38:50 +02:00 |
Miriam Baglioni
|
8610ad5142
|
added groupby id to fix multiple result with same id at join step
|
2020-05-22 15:32:55 +02:00 |
Miriam Baglioni
|
1e44703e3e
|
merge upstream
|
2020-05-22 15:30:07 +02:00 |
Sandro La Bruzzo
|
72278b9375
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-22 15:17:13 +02:00 |
Sandro La Bruzzo
|
22936d0877
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-05-22 15:15:17 +02:00 |
Sandro La Bruzzo
|
9fbb221457
|
completed mapping of UnpayWall and ORCID
|
2020-05-22 15:15:09 +02:00 |
Miriam Baglioni
|
70389b0a30
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-22 13:53:23 +02:00 |
Miriam Baglioni
|
4308f31165
|
added fix to make test run
|
2020-05-22 13:13:01 +02:00 |
Claudio Atzori
|
946598cfba
|
Merge branch 'master' into provision_indexing
|
2020-05-22 12:35:41 +02:00 |
Claudio Atzori
|
3cf2796ac6
|
code formatting
|
2020-05-22 12:34:00 +02:00 |
Michele Artini
|
dc4621b3cb
|
filter ORCID e MAG identifiers
|
2020-05-22 12:25:01 +02:00 |
Michele Artini
|
9f2d0f1b08
|
filter ORCID e MAG identifiers
|
2020-05-22 11:00:27 +02:00 |
Michele Artini
|
9de71e54a8
|
filter ORCID e MAG identifiers
|
2020-05-22 10:47:39 +02:00 |
Michele Artini
|
c5f7e17348
|
author fullnames
|
2020-05-22 10:08:02 +02:00 |
Claudio Atzori
|
ad40470040
|
Merge branch 'master' into provision_indexing
|
2020-05-22 08:51:22 +02:00 |
Claudio Atzori
|
925d933204
|
making XmlRecordFactory immune to graph encoding changes (mostly to avoid NPEs)
|
2020-05-22 08:50:44 +02:00 |
Claudio Atzori
|
b33dd58be4
|
replaced parameter 'reuseRecords' with 'resumeFrom', allowing to restart the provision workflow execution from any step, useful for manual submissions or debugging
|
2020-05-22 08:50:06 +02:00 |
Michele Artini
|
c7ca3cf35b
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-21 16:48:20 +02:00 |
Michele Artini
|
3e34517479
|
partial implementation of events with rels
|
2020-05-21 16:47:53 +02:00 |
Miriam Baglioni
|
6750075fbd
|
merge upstream
|
2020-05-21 16:31:09 +02:00 |
miconis
|
8b35e0e7f0
|
reimplementation of the author merging in deduprecord creation. implementation of the test class. minor changes
|
2020-05-21 12:02:44 +02:00 |
miconis
|
8bbd1d0501
|
reimplementation of the author merging in deduprecord creation. implementation of the test class.
|
2020-05-21 11:52:14 +02:00 |
Michele Artini
|
e43d4d7778
|
added a coalesce in sql query
|
2020-05-21 11:08:07 +02:00 |
Claudio Atzori
|
dbfb9c19fe
|
minor changes
|
2020-05-21 10:00:14 +02:00 |
Michele Artini
|
b3bcbb3129
|
resolve name of organization countries
|
2020-05-21 08:41:32 +02:00 |
Enrico Ottonello
|
1109d3b3fc
|
Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost
|
2020-05-21 00:41:27 +02:00 |
Enrico Ottonello
|
869a53040e
|
save to text file format
|
2020-05-21 00:41:21 +02:00 |
Sandro La Bruzzo
|
5818abaab4
|
fixed Crossref Mapping
|
2020-05-20 17:05:46 +02:00 |
Claudio Atzori
|
da4267d0fe
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-05-20 14:58:22 +02:00 |
Claudio Atzori
|
d7d2a0637f
|
added extra parameters to the provision indexing workflow
|
2020-05-20 14:55:38 +02:00 |
Miriam Baglioni
|
76f3f73caa
|
merge upstream
|
2020-05-20 10:31:40 +02:00 |
Sandro La Bruzzo
|
b771d67e9d
|
next step of MAG conversion implemented
|
2020-05-20 08:14:03 +02:00 |
Michele Artini
|
85ca5622d4
|
partial implementation of generation of simple events
|
2020-05-19 16:17:35 +02:00 |
Claudio Atzori
|
0bdfbb0a57
|
reintroduced RDD based relation cut off procedure
|
2020-05-19 15:02:21 +02:00 |
Enrico Ottonello
|
934ad570e0
|
joined summaries and activities dataset
|
2020-05-19 12:57:21 +02:00 |
Enrico Ottonello
|
ca722d4d18
|
merged
|
2020-05-19 09:43:12 +02:00 |
Enrico Ottonello
|
7362bc3e9d
|
workflow to generate seq(doi,AuthorList)
|
2020-05-19 09:34:44 +02:00 |
Sandro La Bruzzo
|
8c95b50f26
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-19 09:25:04 +02:00 |
Sandro La Bruzzo
|
486e850bcc
|
next step of MAG conversion implemented
|
2020-05-19 09:24:45 +02:00 |
Enrico Ottonello
|
d4e9075f22
|
Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost
|
2020-05-18 19:51:36 +02:00 |
Enrico Ottonello
|
fc80e8c7de
|
added accumulator; last modified date of the record is added to saved data; lambda file is partitioned into 20 parts before starting downloading
|
2020-05-18 19:51:29 +02:00 |
Claudio Atzori
|
f3bc8aed31
|
lifted memory requirements for country propagation wf
|
2020-05-18 15:29:10 +02:00 |
Miriam Baglioni
|
b71fbb68b1
|
removed the removeOutputDir command from code. Reltions are written in Append. The erase of the output dir ment to remove all the relations computed in the prevoius steps
|
2020-05-18 13:57:20 +02:00 |
Miriam Baglioni
|
629af7cb79
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-18 13:07:36 +02:00 |
Claudio Atzori
|
ef9a9a9f1a
|
remove the outout path when starting
|
2020-05-15 22:34:19 +02:00 |
Enrico Ottonello
|
0b29bb7e3b
|
spark job to download orcid record modified after a fixed date
|
2020-05-15 19:49:26 +02:00 |
Claudio Atzori
|
7838f2c63f
|
init the empty list for author pids mapped from OAF
|
2020-05-15 17:06:01 +02:00 |
Claudio Atzori
|
82b615ab33
|
NPE check
|
2020-05-15 16:04:46 +02:00 |
Miriam Baglioni
|
e26a67c3eb
|
merge with upstream
|
2020-05-15 15:53:05 +02:00 |