Miriam Baglioni
|
f206ff42d6
|
modified code to use the the API. Removing not needed parameters. Rewritten the code to exploit the parallel stream on the entity types
|
2023-10-20 15:49:41 +02:00 |
Miriam Baglioni
|
34358afe75
|
modified resource file, workflow anf default-config. Add 3g of memory Overhead and specified the shuffle partition in the wf confiduration. Removed the multiple instantiation in the wf because of different implementation of the spark job
|
2023-10-20 15:48:27 +02:00 |
Miriam Baglioni
|
18bfff8af3
|
adding test classes and modifying test for bulktag
|
2023-10-20 15:47:03 +02:00 |
Miriam Baglioni
|
69dac91659
|
adding the new code to use the API instead of the Information Service
|
2023-10-20 15:45:52 +02:00 |
Miriam Baglioni
|
a9ede1e989
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2023-10-20 10:14:43 +02:00 |
Claudio Atzori
|
242d647146
|
cleanup & docs
|
2023-10-12 12:23:44 +02:00 |
Claudio Atzori
|
af3ffad6c4
|
[AMF] docs
|
2023-10-12 10:07:52 +02:00 |
Claudio Atzori
|
ba5475ed4c
|
Merge pull request 'Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0 (zero) character' (#345) from fix_truncated_pmid into master
Reviewed-on: D-Net/dnet-hadoop#345
|
2023-10-06 14:19:49 +02:00 |
Giambattista Bloisi
|
2c235e82ad
|
Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0' character
|
2023-10-06 12:35:54 +02:00 |
Claudio Atzori
|
4ac06c9e37
|
Merge pull request 'Fix bug in conversion from dedup json model to Spark Dataset of Rows (instanceTypeMatch no longer working)' (#339) from fix_dedupfailsonmatchinginstances into master
Reviewed-on: D-Net/dnet-hadoop#339
|
2023-10-02 11:34:20 +02:00 |
Claudio Atzori
|
fa692b3629
|
Merge branch 'master' into fix_dedupfailsonmatchinginstances
|
2023-10-02 11:28:16 +02:00 |
Claudio Atzori
|
ef02648399
|
Merge pull request 'fixed dedup configuration management in the Broker workflow' (#341) from fix_8997 into master
Reviewed-on: D-Net/dnet-hadoop#341
|
2023-10-02 11:03:50 +02:00 |
Claudio Atzori
|
d13bb534f0
|
Merge branch 'master' into fix_8997
|
2023-10-02 11:03:18 +02:00 |
Giambattista Bloisi
|
775c3f704a
|
Fix bug in conversion from dedup json model to Spark Dataset of Rows: list of strings contained the json escaped representation of the value instead of the plain value, this caused instanceTypeMatch failures because of the leading and trailing double quotes
|
2023-09-27 22:30:47 +02:00 |
Sandro La Bruzzo
|
9c3ab11d5b
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2023-09-25 15:29:19 +02:00 |
Sandro La Bruzzo
|
423ef30676
|
minor fix on the aggregation of uniprot and pdb
|
2023-09-25 15:28:58 +02:00 |
Giambattista Bloisi
|
7152d47f84
|
Use asScala to convert java List to Scala Sequence
|
2023-09-20 16:14:27 +02:00 |
Claudio Atzori
|
4853c19b5e
|
code formatting
|
2023-09-20 15:53:21 +02:00 |
Giambattista Bloisi
|
1f226d1dce
|
Fix defect #8997: GenerateEventsJob is generating huge amounts of logs because broker entity similarity calculation consistently failed
|
2023-09-20 15:42:00 +02:00 |
Alessia Bardi
|
6186cdc2cc
|
Use v5 of the UNIBI Gold ISSN list in test
|
2023-09-19 14:47:01 +02:00 |
Alessia Bardi
|
d94b9bebf7
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2023-09-19 13:38:45 +02:00 |
Alessia Bardi
|
19abba8fa7
|
tests for d4science catalog
|
2023-09-19 13:38:25 +02:00 |
Claudio Atzori
|
c2f179800c
|
Merge pull request 'Run CC and RAM sequentieally in dhp-impact-indicators WF' (#338) from run_cc_and_ram_sequentially into master
Reviewed-on: D-Net/dnet-hadoop#338
|
2023-09-13 08:52:53 +02:00 |
Serafeim Chatzopoulos
|
2aed5a74be
|
Run CC and RAM sequentieally in dhp-impact-indicators WF
|
2023-09-12 22:31:50 +03:00 |
Claudio Atzori
|
4dc4862011
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2023-09-12 14:34:34 +02:00 |
Claudio Atzori
|
dc80ab14d3
|
[graph dedup] consistency wf should not remove the relations while dispatching the entities
|
2023-09-12 14:34:28 +02:00 |
Alessia Bardi
|
77a2199837
|
updated test for EOSC comunity
|
2023-09-08 11:05:49 +02:00 |
Claudio Atzori
|
265180bfd2
|
added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map
|
2023-09-07 11:20:35 +02:00 |
Claudio Atzori
|
da0e9828f7
|
resolved conflicts for PR#337
|
2023-09-06 11:28:46 +02:00 |
Claudio Atzori
|
9f5d16624c
|
Merge pull request '[graph raw] datainfo.invisible set as true only for entities' (#336) from invisible_relations into beta
Reviewed-on: D-Net/dnet-hadoop#336
|
2023-09-04 16:14:47 +02:00 |
Claudio Atzori
|
adec6692ca
|
Merge branch 'beta' into invisible_relations
|
2023-09-04 16:13:06 +02:00 |
Claudio Atzori
|
15666e86a8
|
added collectedfrom to the affiliation relations imported from Crossref
|
2023-09-04 15:56:06 +02:00 |
Claudio Atzori
|
7d6bd4f20b
|
Merge pull request 'Fix import of affiliations relations from Crossref' (#335) from 8876_fix_crossref_affiliation_relations_import into beta
Reviewed-on: D-Net/dnet-hadoop#335
|
2023-09-04 15:19:58 +02:00 |
Claudio Atzori
|
5b06c9d06f
|
[graph raw] datainfo.invisible set as true only for entities
|
2023-09-04 15:15:24 +02:00 |
Serafeim Chatzopoulos
|
7de0164c26
|
Fix import of affiliations relations from Crossref
|
2023-09-04 16:04:41 +03:00 |
Claudio Atzori
|
488d9a1cea
|
Merge pull request 'Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb' (#331) from consistencywf_memoryoverhead_conf into beta
Reviewed-on: D-Net/dnet-hadoop#331
|
2023-08-29 16:31:36 +02:00 |
Giambattista Bloisi
|
6b1c05d118
|
Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb
|
2023-08-29 16:04:19 +02:00 |
Claudio Atzori
|
bf35280ea6
|
code formatting
|
2023-08-29 11:11:00 +02:00 |
Claudio Atzori
|
0515d81c7c
|
Merge pull request 'Rewrite SparkPropagateRelation exploiting Dataframe API' (#330) from propagate_relation_rewrite into beta
Reviewed-on: D-Net/dnet-hadoop#330
|
2023-08-29 10:47:14 +02:00 |
Claudio Atzori
|
58665a246c
|
Merge branch 'beta' into propagate_relation_rewrite
|
2023-08-29 10:47:02 +02:00 |
Claudio Atzori
|
f437be80ad
|
[impact indicators] adjusted paths in the bip ranker wf parameters
|
2023-08-29 09:03:03 +02:00 |
Giambattista Bloisi
|
d012aec0b3
|
Revert PropagateRelation's argument name from outputPath to graphOutputPath in consistency workflow (#8964)
|
2023-08-28 22:44:54 +02:00 |
Giambattista Bloisi
|
a860e19423
|
Fix ensure all relations are written out, not only those managed by dedup
|
2023-08-28 15:36:02 +02:00 |
Giambattista Bloisi
|
0d7b2bf83d
|
Rewrite SparkPropagateRelation exploiting Dataframe API
|
2023-08-28 10:34:54 +02:00 |
Miriam Baglioni
|
9c8b41475a
|
Merge pull request '8172_impact_indicators_workflow' (#284) from 8172_impact_indicators_workflow into beta
Reviewed-on: D-Net/dnet-hadoop#284
|
2023-08-14 15:50:48 +02:00 |
Serafeim Chatzopoulos
|
97c1ba8918
|
Merge actionsets of results and projects
|
2023-08-11 15:56:53 +03:00 |
Miriam Baglioni
|
35b8deb2c6
|
Merge pull request 'DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag' (#329) from dispatch_filter_invisible_entities into beta
Reviewed-on: D-Net/dnet-hadoop#329
|
2023-08-10 12:56:18 +02:00 |
Giambattista Bloisi
|
95cd2b9b1e
|
Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
|
2023-08-10 11:53:48 +02:00 |
Giambattista Bloisi
|
fab9920271
|
DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
|
2023-08-09 15:41:43 +02:00 |
Miriam Baglioni
|
599828ce35
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2023-08-09 13:07:13 +02:00 |