Claudio Atzori
|
7d416f08d8
|
graph cleaning workflow: set hostedby to unknown repository when defined as NULL
|
2020-06-22 09:50:43 +02:00 |
Michele Artini
|
16c7a18435
|
refactoring
|
2020-06-22 08:51:31 +02:00 |
Michele Artini
|
f9fc64ffaf
|
âÃMerge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-19 15:24:43 +02:00 |
Michele Artini
|
d88fe0ac84
|
join methods
|
2020-06-19 15:24:30 +02:00 |
Sandro La Bruzzo
|
464eeeec87
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-19 15:11:53 +02:00 |
Sandro La Bruzzo
|
1681de672d
|
updated mapping scholexplorer to OAF
|
2020-06-19 15:11:46 +02:00 |
Michele Artini
|
4822747313
|
some fixes
|
2020-06-19 13:53:56 +02:00 |
Michele Artini
|
834f139e6e
|
fixed some NPE
|
2020-06-19 12:33:29 +02:00 |
Claudio Atzori
|
d0ac7514b2
|
cleaning workflow to include cleaning of default values
|
2020-06-18 19:37:25 +02:00 |
Michele Artini
|
52f62d5d8c
|
events
|
2020-06-18 14:49:13 +02:00 |
Michele Artini
|
61634fbfe0
|
removed kryo encoding
|
2020-06-18 14:09:58 +02:00 |
Michele Artini
|
8d2b199dd2
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-18 13:15:34 +02:00 |
Michele Artini
|
e659b02e6b
|
some wf fixing
|
2020-06-18 13:15:13 +02:00 |
Michele Artini
|
9a847b4557
|
some wf fixing
|
2020-06-18 13:14:10 +02:00 |
Sandro La Bruzzo
|
9bf67f5de1
|
resolved conflicts
|
2020-06-17 09:15:43 +02:00 |
Sandro La Bruzzo
|
1d4275acc4
|
implemented first version of exportation of Scholexplorer into ActionSet
|
2020-06-17 09:10:38 +02:00 |
miconis
|
5233b15265
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-16 18:31:19 +02:00 |
miconis
|
11b77b9f4e
|
json dumps for entity merge test modified to fit the new model. title merge adjusted to fix the error
|
2020-06-16 18:31:11 +02:00 |
Claudio Atzori
|
64f02de5d3
|
updated workflow definition to include the cleaning step
|
2020-06-16 17:48:51 +02:00 |
Claudio Atzori
|
306669209f
|
code formatting
|
2020-06-16 16:54:44 +02:00 |
Claudio Atzori
|
1bc1d15eaf
|
stubbing for mock datasource.identities must be typed as array
|
2020-06-16 16:54:28 +02:00 |
Claudio Atzori
|
631fef12a7
|
Merge branch 'master' into dhp_oaf_model
|
2020-06-16 16:11:19 +02:00 |
Michele Artini
|
9e2c23e391
|
partial refactoring
|
2020-06-16 15:55:42 +02:00 |
Michele Artini
|
113c9b1de0
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-16 15:53:39 +02:00 |
Michele Artini
|
76ea7607f7
|
partial refactoring
|
2020-06-16 15:53:13 +02:00 |
Claudio Atzori
|
603b1bd0bb
|
Merge branch 'master' into dhp_oaf_model
|
2020-06-16 15:43:59 +02:00 |
Claudio Atzori
|
5441f01586
|
Merge pull request 'missing landingPage urls in instances' (#22) from instances-with-landing-page into master
Looks good, thanks!
|
2020-06-16 15:32:44 +02:00 |
Claudio Atzori
|
89859111ee
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-16 15:28:29 +02:00 |
Claudio Atzori
|
4ec262db53
|
included externalreference(s) in the result view on the Hive graph DB
|
2020-06-16 15:28:20 +02:00 |
Michele Artini
|
8a4f84f8c0
|
refactoring
|
2020-06-16 12:34:13 +02:00 |
Claudio Atzori
|
2a4f65795f
|
WIP: graph cleaner implementation
|
2020-06-15 18:32:24 +02:00 |
Claudio Atzori
|
c15c8c0ad0
|
map datasource identities (including piwik ids) as original IDs
|
2020-06-15 16:07:30 +02:00 |
Claudio Atzori
|
0d52816244
|
WIP: graph cleaner implementation
|
2020-06-13 13:06:04 +02:00 |
Claudio Atzori
|
bed65a1be6
|
WIP: graph cleaner implementation
|
2020-06-12 18:25:47 +02:00 |
Claudio Atzori
|
c4d9f1837f
|
[maven-release-plugin] prepare for next development iteration
|
2020-06-12 12:21:08 +02:00 |
Claudio Atzori
|
f0746a7605
|
[maven-release-plugin] prepare release dhp-1.2.2
|
2020-06-12 12:21:03 +02:00 |
Claudio Atzori
|
463489f59f
|
code formatting
|
2020-06-12 12:03:25 +02:00 |
Claudio Atzori
|
4bcad1c9c3
|
Merge branch 'graph_cleaning'
|
2020-06-12 11:40:25 +02:00 |
Claudio Atzori
|
cdb1956fe9
|
WIP: graph cleaner implementation
|
2020-06-12 11:36:59 +02:00 |
Alessia Bardi
|
b347499745
|
do not use deprecated subreltype
|
2020-06-12 10:58:02 +02:00 |
Claudio Atzori
|
97b1c4057c
|
WIP: graph cleaner implementation
|
2020-06-12 10:45:18 +02:00 |
Claudio Atzori
|
ba8a024af9
|
avoid NPEs merging titles
|
2020-06-12 10:45:11 +02:00 |
Michele Artini
|
30ea1bda88
|
oozie workflow
|
2020-06-12 10:42:35 +02:00 |
Michele Artini
|
c22cb5a3c6
|
refactoring
|
2020-06-12 09:47:55 +02:00 |
Michele Artini
|
472cf77639
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-06-11 14:30:47 +02:00 |
Michele Artini
|
c6b5bb3f17
|
orcid events
|
2020-06-11 14:30:24 +02:00 |
Michele Artini
|
c2e1b66e83
|
Revert "orcid events"
This reverts commit 48959e9a17 .
|
2020-06-11 14:28:03 +02:00 |
Michele Artini
|
48959e9a17
|
orcid events
|
2020-06-11 14:24:02 +02:00 |
Alessia Bardi
|
e79943965b
|
Fixes #5604: field oamandatepublications in XML
|
2020-06-11 12:49:31 +02:00 |
Michele Artini
|
a41e0cb648
|
missing landingPage urls in instances
|
2020-06-11 12:28:34 +02:00 |
Michele Artini
|
04fdcacd83
|
results with all joined entities
|
2020-06-11 11:25:18 +02:00 |
Michele Artini
|
99f88e1cb8
|
fixed generation entities from claims
|
2020-06-11 10:51:57 +02:00 |
Claudio Atzori
|
d1d92c4d8c
|
fixed integration of claims in the graph
|
2020-06-11 10:12:00 +02:00 |
Claudio Atzori
|
953da4a427
|
Merge branch 'master' into graph_cleaning
|
2020-06-10 21:36:56 +02:00 |
Claudio Atzori
|
f1bce64391
|
WIP: graph cleaner implementation
|
2020-06-10 21:36:31 +02:00 |
Claudio Atzori
|
67c7b31ba6
|
Merge branch 'master' into graph_cleaning
|
2020-06-10 15:00:35 +02:00 |
Claudio Atzori
|
3ebf81d2b0
|
Merge pull request 'oaf-store-interpretation' (#21) from oaf-store-interpretation into master
Looks good, thanks Michele!
|
2020-06-10 14:58:09 +02:00 |
Michele Artini
|
5869cb76b3
|
reformatting
|
2020-06-10 12:11:16 +02:00 |
Michele Artini
|
c08e66e01e
|
fixed a workflow parameter
|
2020-06-10 10:11:56 +02:00 |
Michele Artini
|
7177a32d75
|
import of invisible stores
|
2020-06-10 10:04:00 +02:00 |
Claudio Atzori
|
ce12f236bb
|
disabled test, need to need to update the joined_entity.json file
|
2020-06-09 20:07:36 +02:00 |
Claudio Atzori
|
a2fdf85ba1
|
WIP: graph cleaner implementation
|
2020-06-09 19:52:53 +02:00 |
Alessia Bardi
|
4551c1082f
|
mapping csv for orcid
|
2020-06-09 18:08:47 +02:00 |
Alessia Bardi
|
2d3f7d1eb4
|
fixed log classes to make the ORCID test run
|
2020-06-09 18:07:14 +02:00 |
Alessia Bardi
|
a3a6755d58
|
mapping csv for Unpaywall
|
2020-06-09 17:45:44 +02:00 |
Claudio Atzori
|
d9f33582c5
|
WIP: graph cleaner implementation
|
2020-06-09 17:20:40 +02:00 |
Alessia Bardi
|
f3b033cf09
|
added csv line for funders from Crossref
|
2020-06-09 17:08:26 +02:00 |
Alessia Bardi
|
79969d78b9
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-09 17:05:39 +02:00 |
Alessia Bardi
|
fc4d220964
|
updated function name for SNSF
|
2020-06-09 17:05:31 +02:00 |
Michele Artini
|
baaa55f4a3
|
use of pace to calculate trusts
|
2020-06-09 16:01:31 +02:00 |
Alessia Bardi
|
33b130ec43
|
Mapping instructions for MAG
|
2020-06-09 15:57:15 +02:00 |
Alessia Bardi
|
d6de406e11
|
fixed classid for subjects
|
2020-06-09 14:43:34 +02:00 |
Alessia Bardi
|
f072125152
|
map volume and issue in journal information from MAG
|
2020-06-09 14:32:10 +02:00 |
Alessia Bardi
|
b7cb1163ea
|
identifiers always start with 50
|
2020-06-09 10:39:11 +02:00 |
Alessia Bardi
|
181f52b9bc
|
Added mapping table for Crossref
|
2020-06-08 19:33:47 +02:00 |
Alessia Bardi
|
9fd25887f7
|
Result identifiers all start with 50|
|
2020-06-08 19:32:24 +02:00 |
Alessia Bardi
|
16cb073b15
|
set the instance datepfacceptance with the Crossref createdDate in case the issuedDate is blank
|
2020-06-08 19:06:03 +02:00 |
Michele Artini
|
bb659d870c
|
join simrels
|
2020-06-08 16:29:01 +02:00 |
Michele Artini
|
81e85465d8
|
join simrels
|
2020-06-08 16:26:16 +02:00 |
Claudio Atzori
|
3d871c6651
|
Merge branch 'master' into graph_cleaning
|
2020-06-08 15:23:24 +02:00 |
Claudio Atzori
|
25a093b1a4
|
integrated changes from master
|
2020-06-08 15:04:00 +02:00 |
Sandro La Bruzzo
|
e34e7d6728
|
merge DOIBoost
|
2020-06-08 08:32:22 +02:00 |
Sandro La Bruzzo
|
e46e2a4776
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-06-08 08:17:14 +02:00 |
Spyros Zoupanos
|
3576dd186b
|
Adding hive timeout as workflow parameter
|
2020-06-05 22:29:54 +03:00 |
Claudio Atzori
|
b2349659cf
|
WIP: graph property fixing implementation
|
2020-06-05 18:37:38 +02:00 |
Michele Artini
|
a73973a74b
|
partial implemantation of broker events generation
|
2020-06-05 11:43:00 +02:00 |
Michele Artini
|
7e82996e7c
|
partial implemantation of broker events generation
|
2020-06-04 17:10:43 +02:00 |
Sandro La Bruzzo
|
b57e8ba374
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-06-04 14:39:41 +02:00 |
Sandro La Bruzzo
|
7ac1ba2e35
|
improvement DOIBoost
|
2020-06-04 14:39:20 +02:00 |
Michele Artini
|
97177d7f7b
|
partial refactoring
|
2020-06-04 10:26:34 +02:00 |
Sandro La Bruzzo
|
13815d5d13
|
improvement DOIBoost
|
2020-06-01 17:52:12 +02:00 |
Claudio Atzori
|
05f269a1c0
|
kryo based parallel implementation of CreateRelatedEntitiesJob_phase2, now works by OafType; introduced custom aggregator in AdjacencyListBuilderJob
|
2020-06-01 00:32:42 +02:00 |
Claudio Atzori
|
5e23fb3a74
|
code formatting
|
2020-05-30 10:52:56 +02:00 |
Claudio Atzori
|
54ca8ed6c3
|
uniformed param name (isLookupUrl), Vocab model classes defined as Serializable
|
2020-05-29 18:17:30 +02:00 |
Claudio Atzori
|
1577bd5b8b
|
added IsLookupUrl to the raw_db workflow parameters
|
2020-05-29 16:18:16 +02:00 |
Claudio Atzori
|
91d78b825b
|
Merge pull request 'import from db using is vocabularies' (#17) from result_pids into master
Looks good, thanks Michele!
|
2020-05-29 16:02:40 +02:00 |
Michele Artini
|
adb798faa5
|
import from db using is vocabularies
|
2020-05-29 12:03:51 +02:00 |
Claudio Atzori
|
6f5f498c78
|
restored common properties driving executor-cores and executor-memory in join_organization_relations wf node
|
2020-05-29 11:22:00 +02:00 |
Claudio Atzori
|
b2f9564f13
|
WIP: fixed PrepareRelationsJob; parallel implementation of CreateRelatedEntitiesJob_phase2, now works by OafType; introduced custom aggregator in AdjacencyListBuilderJob
|
2020-05-29 10:58:15 +02:00 |
Miriam Baglioni
|
dfa4997a4f
|
removed commented code
|
2020-05-29 10:45:18 +02:00 |
Miriam Baglioni
|
6f1eea28b6
|
changed message in log
|
2020-05-29 10:41:39 +02:00 |
Sandro La Bruzzo
|
b87b3ddb6b
|
changed mapping ORCIDToOAF
|
2020-05-29 09:32:04 +02:00 |
Miriam Baglioni
|
8b6e886fb6
|
added new resource for testing
|
2020-05-28 23:54:31 +02:00 |
Miriam Baglioni
|
6989fb9c8a
|
changed the project test according to the newly introduced join with the db project codes
|
2020-05-28 23:53:24 +02:00 |
Miriam Baglioni
|
782984d8e5
|
added needed parameter
|
2020-05-28 23:52:41 +02:00 |
Miriam Baglioni
|
01f7876595
|
fix issue with flatMap - the return type must not be null
|
2020-05-28 23:50:32 +02:00 |
Claudio Atzori
|
a57965a3ea
|
limiting the dimensions of outliers
|
2020-05-28 17:36:37 +02:00 |
Miriam Baglioni
|
773735f870
|
added the path to the file containing the projects code from the db
|
2020-05-28 17:30:45 +02:00 |
Miriam Baglioni
|
6a15067a64
|
added one step in the workflow
|
2020-05-28 17:30:09 +02:00 |
Miriam Baglioni
|
5309a99a70
|
modified the PrepareProjects to consider those in the db
|
2020-05-28 17:29:53 +02:00 |
Miriam Baglioni
|
b737ed8236
|
added part to read projects from the openaire db to filter out those in the csv file that are not in the db
|
2020-05-28 17:29:21 +02:00 |
Claudio Atzori
|
821be1f8b6
|
experimental implementation of custom aggregation using kryo encoders
|
2020-05-28 13:53:13 +02:00 |
Claudio Atzori
|
83504ecace
|
limiting the maximum number of authors allowed in XML records to MAX_AUTHORS = 200; authors with ORCID can exceed that limit
|
2020-05-28 13:52:30 +02:00 |
Claudio Atzori
|
ef11593068
|
JoinedEntity.links defined as empty list by default
|
2020-05-28 13:50:44 +02:00 |
Claudio Atzori
|
5dea155a87
|
increased number of partitions produced by the join_all_entities phase as well as spark.sql.shuffle.partitions in adjancency_lists phase
|
2020-05-28 13:49:59 +02:00 |
Miriam Baglioni
|
35b7279147
|
changed test because data are saved as SequenceFile now, and because of the group by the umber of produced update decrease
|
2020-05-28 10:26:12 +02:00 |
Miriam Baglioni
|
37c155b86a
|
merge branch with fork master
|
2020-05-28 10:09:51 +02:00 |
Miriam Baglioni
|
df44db686a
|
refactoring
|
2020-05-28 10:07:00 +02:00 |
Miriam Baglioni
|
87b07f4af8
|
removed unused variables
|
2020-05-28 10:05:43 +02:00 |
Miriam Baglioni
|
1060977272
|
added fs actions to remove and the create the workingDir
|
2020-05-28 10:04:36 +02:00 |
Miriam Baglioni
|
96d1a3c431
|
deleted the file were to store the csv files
|
2020-05-28 10:04:10 +02:00 |
Miriam Baglioni
|
669c05c771
|
added groupBy before creating Actions
|
2020-05-28 10:00:45 +02:00 |
Sandro La Bruzzo
|
02f90eeb07
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-28 09:58:32 +02:00 |
Sandro La Bruzzo
|
7d29b61c62
|
code refactor
|
2020-05-28 09:57:46 +02:00 |
Claudio Atzori
|
fdd54bad1c
|
code formatting
|
2020-05-27 19:31:54 +02:00 |
Miriam Baglioni
|
1855453434
|
changed the outputdir of the last step
|
2020-05-27 17:59:36 +02:00 |
Claudio Atzori
|
b9b1bc9967
|
Merge branch 'master' into provision_indexing
|
2020-05-27 12:55:20 +02:00 |
Claudio Atzori
|
aac1515b58
|
Merge pull request 'result_pids without conflicts ???' (#16) from result_pids into master
Looks good, thanks Michele
|
2020-05-27 12:54:52 +02:00 |
Michele Artini
|
f5ce7d76e1
|
resolve conflicts
|
2020-05-27 12:49:17 +02:00 |
Claudio Atzori
|
cfd753217c
|
repartition the join_entities in 24k files
|
2020-05-27 12:44:01 +02:00 |
Claudio Atzori
|
2f1a623d09
|
sync from master branch
|
2020-05-27 12:39:58 +02:00 |
Claudio Atzori
|
9e4ec1543b
|
updated test
|
2020-05-27 12:38:42 +02:00 |
Claudio Atzori
|
8047d16dd9
|
added RDD based adjacency list creation procedure
|
2020-05-27 12:38:12 +02:00 |
Claudio Atzori
|
f057dcdf65
|
limit the max number of externalreferences to MAX_EXTERNAL_ENTITIES
|
2020-05-27 12:37:33 +02:00 |
Michele Artini
|
b81f2741d2
|
xquery
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
a25598140a
|
result pids (new xpaths + IS vocabularies)
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
7a7272d9ec
|
result pids (new xpaths + IS vocabularies)
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
3ceb2d2853
|
match terms with vocabularies
|
2020-05-27 11:34:13 +02:00 |
Claudio Atzori
|
4e36d689dd
|
fixed XML serialization for children sub-elements (duplicates & externalreferences)
|
2020-05-26 18:30:40 +02:00 |
Miriam Baglioni
|
92e3a52e91
|
merge branch with fork master
|
2020-05-26 15:57:51 +02:00 |
Michele Artini
|
c15d997925
|
xquery
|
2020-05-26 13:13:17 +02:00 |
Michele Artini
|
c6af36496a
|
result pids (new xpaths + IS vocabularies)
|
2020-05-26 13:11:09 +02:00 |
Michele Artini
|
093f1aff03
|
result pids (new xpaths + IS vocabularies)
|
2020-05-26 13:06:55 +02:00 |
Claudio Atzori
|
b8e541a454
|
fixing repeated organization.websiteurl in organization entities (#5645) as well as project.ecinternationalorganizationeurinterests
|
2020-05-26 10:30:09 +02:00 |
Claudio Atzori
|
55595d7235
|
HACK: patch NULL values with defaults found in result.datainfo.deletedbyinference and result.context
|
2020-05-26 10:28:35 +02:00 |
Claudio Atzori
|
7b288a94cb
|
code formatting
|
2020-05-26 09:54:13 +02:00 |
Miriam Baglioni
|
54d869e618
|
merge upstream
|
2020-05-26 09:22:04 +02:00 |
Miriam Baglioni
|
eea07f4c42
|
refactoring
|
2020-05-26 09:21:49 +02:00 |
Sandro La Bruzzo
|
79c26382da
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-26 09:15:50 +02:00 |
Sandro La Bruzzo
|
25f52e19a4
|
implemented generation of ActionSet
|
2020-05-26 09:15:33 +02:00 |