Claudio Atzori
|
ee8a39e7d2
|
cleanup and refinements
|
2023-10-04 12:32:05 +02:00 |
Claudio Atzori
|
5919e488dd
|
Merge branch 'beta' into importpoci
|
2023-10-03 10:43:53 +02:00 |
Miriam Baglioni
|
d7fccdc64b
|
fixed paths in wf to match the req of the pathname
|
2023-10-02 14:10:57 +02:00 |
Miriam Baglioni
|
9898470b0e
|
Addressing comments in D-Net/dnet-hadoop#340\#issuecomment-10592
|
2023-10-02 12:54:16 +02:00 |
Giambattista Bloisi
|
c412dc162b
|
Fix bug in conversion from dedup json model to Spark Dataset of Rows: list of strings contained the json escaped representation of the value instead of the plain value, this caused instanceTypeMatch failures because of the leading and trailing double quotes
|
2023-10-02 11:34:51 +02:00 |
Claudio Atzori
|
5d09b7db8b
|
Merge pull request 'SparkPropagateRelation relations do not propagate deletedByInference and invisible' (#333) from consistency_keep_mergerels into beta
Reviewed-on: D-Net/dnet-hadoop#333
|
2023-10-02 11:27:57 +02:00 |
Claudio Atzori
|
7b403a920f
|
Merge branch 'beta' into consistency_keep_mergerels
|
2023-10-02 11:26:00 +02:00 |
Claudio Atzori
|
dc86018a5f
|
Merge branch 'merge_entities_job' into beta
|
2023-10-02 11:24:48 +02:00 |
Giambattista Bloisi
|
3c47920c78
|
Use asScala to convert java List to Scala Sequence
|
2023-10-02 11:04:47 +02:00 |
Claudio Atzori
|
7f244d9a7a
|
code formatting
|
2023-10-02 11:04:36 +02:00 |
Giambattista Bloisi
|
e239b81740
|
Fix defect #8997: GenerateEventsJob is generating huge amounts of logs because broker entity similarity calculation consistently failed
|
2023-10-02 11:04:18 +02:00 |
Miriam Baglioni
|
e84f5b5e64
|
extended existing codo to accomodate import of POCI from open citation
|
2023-10-02 09:25:16 +02:00 |
Alessia Bardi
|
0935d7757c
|
Use v5 of the UNIBI Gold ISSN list in test
|
2023-09-20 15:41:35 +02:00 |
Alessia Bardi
|
cc7204a089
|
tests for d4science catalog
|
2023-09-20 15:38:32 +02:00 |
Sandro La Bruzzo
|
76476cdfb6
|
Added maven repo for dependencies that are not in maven central
|
2023-09-20 10:33:14 +02:00 |
Serafeim Chatzopoulos
|
395a4af020
|
Run CC and RAM sequentieally in dhp-impact-indicators WF
|
2023-09-13 08:59:40 +02:00 |
Claudio Atzori
|
8a6892cc63
|
[graph dedup] consistency wf should not remove the relations while dispatching the entities
|
2023-09-12 21:27:05 +02:00 |
Claudio Atzori
|
4786aa0e09
|
added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map
|
2023-09-07 11:21:07 +02:00 |
Claudio Atzori
|
9f5d16624c
|
Merge pull request '[graph raw] datainfo.invisible set as true only for entities' (#336) from invisible_relations into beta
Reviewed-on: D-Net/dnet-hadoop#336
|
2023-09-04 16:14:47 +02:00 |
Claudio Atzori
|
adec6692ca
|
Merge branch 'beta' into invisible_relations
|
2023-09-04 16:13:06 +02:00 |
Claudio Atzori
|
15666e86a8
|
added collectedfrom to the affiliation relations imported from Crossref
|
2023-09-04 15:56:06 +02:00 |
Claudio Atzori
|
7d6bd4f20b
|
Merge pull request 'Fix import of affiliations relations from Crossref' (#335) from 8876_fix_crossref_affiliation_relations_import into beta
Reviewed-on: D-Net/dnet-hadoop#335
|
2023-09-04 15:19:58 +02:00 |
Claudio Atzori
|
5b06c9d06f
|
[graph raw] datainfo.invisible set as true only for entities
|
2023-09-04 15:15:24 +02:00 |
Serafeim Chatzopoulos
|
7de0164c26
|
Fix import of affiliations relations from Crossref
|
2023-09-04 16:04:41 +03:00 |
Giambattista Bloisi
|
2caaaec42d
|
Include SparkCleanRelation logic in SparkPropagateRelation
SparkPropagateRelation includes merge relations
Revised tests for SparkPropagateRelation
|
2023-09-04 11:33:20 +02:00 |
Giambattista Bloisi
|
6cc7d8ca7b
|
GroupEntities and DispatchEntites are now merged in GroupEntitiesSparkJob
|
2023-08-30 10:43:31 +02:00 |
Claudio Atzori
|
488d9a1cea
|
Merge pull request 'Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb' (#331) from consistencywf_memoryoverhead_conf into beta
Reviewed-on: D-Net/dnet-hadoop#331
|
2023-08-29 16:31:36 +02:00 |
Giambattista Bloisi
|
6b1c05d118
|
Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb
|
2023-08-29 16:04:19 +02:00 |
Claudio Atzori
|
bf35280ea6
|
code formatting
|
2023-08-29 11:11:00 +02:00 |
Claudio Atzori
|
0515d81c7c
|
Merge pull request 'Rewrite SparkPropagateRelation exploiting Dataframe API' (#330) from propagate_relation_rewrite into beta
Reviewed-on: D-Net/dnet-hadoop#330
|
2023-08-29 10:47:14 +02:00 |
Claudio Atzori
|
58665a246c
|
Merge branch 'beta' into propagate_relation_rewrite
|
2023-08-29 10:47:02 +02:00 |
Claudio Atzori
|
f437be80ad
|
[impact indicators] adjusted paths in the bip ranker wf parameters
|
2023-08-29 09:03:03 +02:00 |
Giambattista Bloisi
|
d012aec0b3
|
Revert PropagateRelation's argument name from outputPath to graphOutputPath in consistency workflow (#8964)
|
2023-08-28 22:44:54 +02:00 |
Giambattista Bloisi
|
a860e19423
|
Fix ensure all relations are written out, not only those managed by dedup
|
2023-08-28 15:36:02 +02:00 |
Giambattista Bloisi
|
0d7b2bf83d
|
Rewrite SparkPropagateRelation exploiting Dataframe API
|
2023-08-28 10:34:54 +02:00 |
Miriam Baglioni
|
9c8b41475a
|
Merge pull request '8172_impact_indicators_workflow' (#284) from 8172_impact_indicators_workflow into beta
Reviewed-on: D-Net/dnet-hadoop#284
|
2023-08-14 15:50:48 +02:00 |
Serafeim Chatzopoulos
|
97c1ba8918
|
Merge actionsets of results and projects
|
2023-08-11 15:56:53 +03:00 |
Miriam Baglioni
|
35b8deb2c6
|
Merge pull request 'DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag' (#329) from dispatch_filter_invisible_entities into beta
Reviewed-on: D-Net/dnet-hadoop#329
|
2023-08-10 12:56:18 +02:00 |
Giambattista Bloisi
|
95cd2b9b1e
|
Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
|
2023-08-10 11:53:48 +02:00 |
Giambattista Bloisi
|
fab9920271
|
DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
|
2023-08-09 15:41:43 +02:00 |
Miriam Baglioni
|
c25ac21e5e
|
Merge pull request 'graph cleaning, suggestions from ticket 8898' (#325) from cleaning_8898 into beta
Reviewed-on: D-Net/dnet-hadoop#325
|
2023-08-08 11:14:19 +02:00 |
Miriam Baglioni
|
c334fe2438
|
Merge pull request 'Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities' (#328) from cleanup_relations_after_dedup into beta
Reviewed-on: D-Net/dnet-hadoop#328
|
2023-08-08 09:49:12 +02:00 |
Miriam Baglioni
|
0e2f855807
|
Merge pull request 'Updates Promotion DBs' (#321) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#321
|
2023-08-07 12:09:16 +02:00 |
Miriam Baglioni
|
18fbe52b20
|
Merge pull request 'Import affiliation relations from Crossref' (#320) from 8876 into beta
Reviewed-on: D-Net/dnet-hadoop#320
|
2023-08-07 10:45:30 +02:00 |
Giambattista Bloisi
|
97b6d1dc45
|
Filter ids by dataInfo.deletedbyinference and DataInfo.invisible flags
Filter relations also by dataInfo.invisible flag
|
2023-08-07 10:24:11 +02:00 |
Giambattista Bloisi
|
af49424b59
|
Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleyted by inference or that are pointing to dangling entities
|
2023-08-04 14:27:39 +02:00 |
Claudio Atzori
|
b9dddbfe54
|
rule out records with NULL dataInfo, except for Relations
|
2023-07-31 17:53:54 +02:00 |
Claudio Atzori
|
11ffb9bd68
|
rule out records with NULL dataInfo
|
2023-07-31 12:35:33 +02:00 |
Serafeim Chatzopoulos
|
7cefe2665b
|
Remove unnecessary classes
|
2023-07-28 19:14:39 +03:00 |
Serafeim Chatzopoulos
|
26a92ce762
|
Merge branch '8876' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8876
|
2023-07-28 19:03:57 +03:00 |