Giambattista Bloisi
c412dc162b
Fix bug in conversion from dedup json model to Spark Dataset of Rows: list of strings contained the json escaped representation of the value instead of the plain value, this caused instanceTypeMatch failures because of the leading and trailing double quotes
2023-10-02 11:34:51 +02:00
Claudio Atzori
5d09b7db8b
Merge pull request 'SparkPropagateRelation relations do not propagate deletedByInference and invisible' ( #333 ) from consistency_keep_mergerels into beta
...
Reviewed-on: D-Net/dnet-hadoop#333
2023-10-02 11:27:57 +02:00
Claudio Atzori
7b403a920f
Merge branch 'beta' into consistency_keep_mergerels
2023-10-02 11:26:00 +02:00
Claudio Atzori
dc86018a5f
Merge branch 'merge_entities_job' into beta
2023-10-02 11:24:48 +02:00
Giambattista Bloisi
3c47920c78
Use asScala to convert java List to Scala Sequence
2023-10-02 11:04:47 +02:00
Claudio Atzori
7f244d9a7a
code formatting
2023-10-02 11:04:36 +02:00
Giambattista Bloisi
e239b81740
Fix defect #8997 : GenerateEventsJob is generating huge amounts of logs because broker entity similarity calculation consistently failed
2023-10-02 11:04:18 +02:00
Miriam Baglioni
e84f5b5e64
extended existing codo to accomodate import of POCI from open citation
2023-10-02 09:25:16 +02:00
Serafeim Chatzopoulos
ab0d70691c
Add step for archiving repoUrls to SWH
2023-09-28 20:56:18 +03:00
Serafeim Chatzopoulos
ed9c81a0b7
Add steps to collect last visit data && archive not found repository URLs
2023-09-27 19:00:54 +03:00
Alessia Bardi
0935d7757c
Use v5 of the UNIBI Gold ISSN list in test
2023-09-20 15:41:35 +02:00
Alessia Bardi
cc7204a089
tests for d4science catalog
2023-09-20 15:38:32 +02:00
Sandro La Bruzzo
76476cdfb6
Added maven repo for dependencies that are not in maven central
2023-09-20 10:33:14 +02:00
dimitrispie
9ef971a146
Update step16-createIndicatorsTables.sql
...
Fix int year for:
indi_org_openess_year
indi_org_fairness_year
indi_org_findable_year
2023-09-19 14:25:42 +03:00
Serafeim Chatzopoulos
9d44418d38
Add collecting software code repository URLs
2023-09-14 18:43:25 +03:00
Serafeim Chatzopoulos
395a4af020
Run CC and RAM sequentieally in dhp-impact-indicators WF
2023-09-13 08:59:40 +02:00
Claudio Atzori
8a6892cc63
[graph dedup] consistency wf should not remove the relations while dispatching the entities
2023-09-12 21:27:05 +02:00
Claudio Atzori
4786aa0e09
added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map
2023-09-07 11:21:07 +02:00
dimitrispie
5f90cc11e9
Update step16-createIndicatorsTables.sql
...
Fix indi_pub_bronze_oa
2023-09-06 14:14:38 +03:00
Claudio Atzori
9f5d16624c
Merge pull request '[graph raw] datainfo.invisible set as true only for entities' ( #336 ) from invisible_relations into beta
...
Reviewed-on: D-Net/dnet-hadoop#336
2023-09-04 16:14:47 +02:00
Claudio Atzori
adec6692ca
Merge branch 'beta' into invisible_relations
2023-09-04 16:13:06 +02:00
Claudio Atzori
15666e86a8
added collectedfrom to the affiliation relations imported from Crossref
2023-09-04 15:56:06 +02:00
Claudio Atzori
7d6bd4f20b
Merge pull request 'Fix import of affiliations relations from Crossref' ( #335 ) from 8876_fix_crossref_affiliation_relations_import into beta
...
Reviewed-on: D-Net/dnet-hadoop#335
2023-09-04 15:19:58 +02:00
Claudio Atzori
5b06c9d06f
[graph raw] datainfo.invisible set as true only for entities
2023-09-04 15:15:24 +02:00
Serafeim Chatzopoulos
7de0164c26
Fix import of affiliations relations from Crossref
2023-09-04 16:04:41 +03:00
Giambattista Bloisi
2caaaec42d
Include SparkCleanRelation logic in SparkPropagateRelation
...
SparkPropagateRelation includes merge relations
Revised tests for SparkPropagateRelation
2023-09-04 11:33:20 +02:00
dimitrispie
964c2f553e
Changes in indicators step, monitor step
...
- graduatedoctorates for observatory
- result_apc_affiliations table
- new indicators
indi_is_funder_plan_s
indi_funder_fairness
indi_ris_fairness
indi_funder_openess
indi_ris_openess
indi_funder_findable
indi_ris_findable
indi_is_project_result_after
- cast year to int in composite indicators
- new institutions
-- Universidade Católica Portuguesa
-- Iscte - Instituto Universitário de Lisboa
-- Munster Technological University
-- Cardiff University
-- Leibniz Institute of Ecological Urban and Regional Development
2023-09-01 10:57:02 +03:00
Giambattista Bloisi
6cc7d8ca7b
GroupEntities and DispatchEntites are now merged in GroupEntitiesSparkJob
2023-08-30 10:43:31 +02:00
Claudio Atzori
488d9a1cea
Merge pull request 'Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb' ( #331 ) from consistencywf_memoryoverhead_conf into beta
...
Reviewed-on: D-Net/dnet-hadoop#331
2023-08-29 16:31:36 +02:00
Giambattista Bloisi
6b1c05d118
Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb
2023-08-29 16:04:19 +02:00
Claudio Atzori
bf35280ea6
code formatting
2023-08-29 11:11:00 +02:00
Claudio Atzori
0515d81c7c
Merge pull request 'Rewrite SparkPropagateRelation exploiting Dataframe API' ( #330 ) from propagate_relation_rewrite into beta
...
Reviewed-on: D-Net/dnet-hadoop#330
2023-08-29 10:47:14 +02:00
Claudio Atzori
58665a246c
Merge branch 'beta' into propagate_relation_rewrite
2023-08-29 10:47:02 +02:00
Claudio Atzori
f437be80ad
[impact indicators] adjusted paths in the bip ranker wf parameters
2023-08-29 09:03:03 +02:00
Giambattista Bloisi
d012aec0b3
Revert PropagateRelation's argument name from outputPath to graphOutputPath in consistency workflow ( #8964 )
2023-08-28 22:44:54 +02:00
Giambattista Bloisi
a860e19423
Fix ensure all relations are written out, not only those managed by dedup
2023-08-28 15:36:02 +02:00
Giambattista Bloisi
0d7b2bf83d
Rewrite SparkPropagateRelation exploiting Dataframe API
2023-08-28 10:34:54 +02:00
Miriam Baglioni
9c8b41475a
Merge pull request '8172_impact_indicators_workflow' ( #284 ) from 8172_impact_indicators_workflow into beta
...
Reviewed-on: D-Net/dnet-hadoop#284
2023-08-14 15:50:48 +02:00
Serafeim Chatzopoulos
97c1ba8918
Merge actionsets of results and projects
2023-08-11 15:56:53 +03:00
Miriam Baglioni
35b8deb2c6
Merge pull request 'DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag' ( #329 ) from dispatch_filter_invisible_entities into beta
...
Reviewed-on: D-Net/dnet-hadoop#329
2023-08-10 12:56:18 +02:00
Giambattista Bloisi
95cd2b9b1e
Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
...
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
2023-08-10 11:53:48 +02:00
Giambattista Bloisi
fab9920271
DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
2023-08-09 15:41:43 +02:00
Miriam Baglioni
c25ac21e5e
Merge pull request 'graph cleaning, suggestions from ticket 8898' ( #325 ) from cleaning_8898 into beta
...
Reviewed-on: D-Net/dnet-hadoop#325
2023-08-08 11:14:19 +02:00
Miriam Baglioni
c334fe2438
Merge pull request 'Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities' ( #328 ) from cleanup_relations_after_dedup into beta
...
Reviewed-on: D-Net/dnet-hadoop#328
2023-08-08 09:49:12 +02:00
Miriam Baglioni
0e2f855807
Merge pull request 'Updates Promotion DBs' ( #321 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#321
2023-08-07 12:09:16 +02:00
Miriam Baglioni
18fbe52b20
Merge pull request 'Import affiliation relations from Crossref' ( #320 ) from 8876 into beta
...
Reviewed-on: D-Net/dnet-hadoop#320
2023-08-07 10:45:30 +02:00
Giambattista Bloisi
97b6d1dc45
Filter ids by dataInfo.deletedbyinference and DataInfo.invisible flags
...
Filter relations also by dataInfo.invisible flag
2023-08-07 10:24:11 +02:00
Giambattista Bloisi
af49424b59
Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleyted by inference or that are pointing to dangling entities
2023-08-04 14:27:39 +02:00
Claudio Atzori
b9dddbfe54
rule out records with NULL dataInfo, except for Relations
2023-07-31 17:53:54 +02:00
Claudio Atzori
11ffb9bd68
rule out records with NULL dataInfo
2023-07-31 12:35:33 +02:00