Commit Graph

3915 Commits

Author SHA1 Message Date
Claudio Atzori 73c49b8d26 Merge branch 'beta' into SWH_integration 2023-10-06 12:21:51 +02:00
Serafeim Chatzopoulos 1bb83b9188 Add prefix in SWH ID 2023-10-04 20:31:45 +03:00
Serafeim Chatzopoulos e9f24df21c Move SWH API Key from constants to workflow param 2023-10-03 20:57:57 +03:00
Serafeim Chatzopoulos cae75fc75d Add SWH in the collectedFrom field 2023-10-03 16:55:10 +03:00
Serafeim Chatzopoulos b49a3ac9b2 Add actionsetsPath as a global WF param 2023-10-03 15:43:38 +03:00
Serafeim Chatzopoulos 24c43e0c60 Restructure workflow parameters 2023-10-03 15:11:58 +03:00
Serafeim Chatzopoulos 9f73d93e62 Add param for limiting repo Urls 2023-10-03 14:39:08 +03:00
Claudio Atzori 5919e488dd Merge branch 'beta' into importpoci 2023-10-03 10:43:53 +02:00
Serafeim Chatzopoulos 839a8524e7 Add action for creating actionsets 2023-10-02 23:50:38 +03:00
Miriam Baglioni d7fccdc64b fixed paths in wf to match the req of the pathname 2023-10-02 14:10:57 +02:00
Miriam Baglioni 9898470b0e Addressing comments in D-Net/dnet-hadoop#340\#issuecomment-10592 2023-10-02 12:54:16 +02:00
Claudio Atzori 7b403a920f Merge branch 'beta' into consistency_keep_mergerels 2023-10-02 11:26:00 +02:00
Claudio Atzori dc86018a5f Merge branch 'merge_entities_job' into beta 2023-10-02 11:24:48 +02:00
Claudio Atzori 7f244d9a7a code formatting 2023-10-02 11:04:36 +02:00
Giambattista Bloisi e239b81740 Fix defect #8997: GenerateEventsJob is generating huge amounts of logs because broker entity similarity calculation consistently failed 2023-10-02 11:04:18 +02:00
Miriam Baglioni e84f5b5e64 extended existing codo to accomodate import of POCI from open citation 2023-10-02 09:25:16 +02:00
Serafeim Chatzopoulos ab0d70691c Add step for archiving repoUrls to SWH 2023-09-28 20:56:18 +03:00
Serafeim Chatzopoulos ed9c81a0b7 Add steps to collect last visit data && archive not found repository URLs 2023-09-27 19:00:54 +03:00
Alessia Bardi 0935d7757c Use v5 of the UNIBI Gold ISSN list in test 2023-09-20 15:41:35 +02:00
Alessia Bardi cc7204a089 tests for d4science catalog 2023-09-20 15:38:32 +02:00
Serafeim Chatzopoulos 9d44418d38 Add collecting software code repository URLs 2023-09-14 18:43:25 +03:00
Serafeim Chatzopoulos 395a4af020 Run CC and RAM sequentieally in dhp-impact-indicators WF 2023-09-13 08:59:40 +02:00
Claudio Atzori 4786aa0e09 added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map 2023-09-07 11:21:07 +02:00
Claudio Atzori adec6692ca Merge branch 'beta' into invisible_relations 2023-09-04 16:13:06 +02:00
Claudio Atzori 15666e86a8 added collectedfrom to the affiliation relations imported from Crossref 2023-09-04 15:56:06 +02:00
Claudio Atzori 5b06c9d06f [graph raw] datainfo.invisible set as true only for entities 2023-09-04 15:15:24 +02:00
Serafeim Chatzopoulos 7de0164c26 Fix import of affiliations relations from Crossref 2023-09-04 16:04:41 +03:00
Giambattista Bloisi 2caaaec42d Include SparkCleanRelation logic in SparkPropagateRelation
SparkPropagateRelation includes merge relations
Revised tests for SparkPropagateRelation
2023-09-04 11:33:20 +02:00
Giambattista Bloisi 6cc7d8ca7b GroupEntities and DispatchEntites are now merged in GroupEntitiesSparkJob 2023-08-30 10:43:31 +02:00
Giambattista Bloisi 6b1c05d118 Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb 2023-08-29 16:04:19 +02:00
Claudio Atzori bf35280ea6 code formatting 2023-08-29 11:11:00 +02:00
Claudio Atzori 58665a246c Merge branch 'beta' into propagate_relation_rewrite 2023-08-29 10:47:02 +02:00
Claudio Atzori f437be80ad [impact indicators] adjusted paths in the bip ranker wf parameters 2023-08-29 09:03:03 +02:00
Giambattista Bloisi d012aec0b3 Revert PropagateRelation's argument name from outputPath to graphOutputPath in consistency workflow (#8964) 2023-08-28 22:44:54 +02:00
Giambattista Bloisi a860e19423 Fix ensure all relations are written out, not only those managed by dedup 2023-08-28 15:36:02 +02:00
Giambattista Bloisi 0d7b2bf83d Rewrite SparkPropagateRelation exploiting Dataframe API 2023-08-28 10:34:54 +02:00
Miriam Baglioni 9c8b41475a Merge pull request '8172_impact_indicators_workflow' (#284) from 8172_impact_indicators_workflow into beta
Reviewed-on: D-Net/dnet-hadoop#284
2023-08-14 15:50:48 +02:00
Serafeim Chatzopoulos 97c1ba8918 Merge actionsets of results and projects 2023-08-11 15:56:53 +03:00
Giambattista Bloisi 95cd2b9b1e Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
2023-08-10 11:53:48 +02:00
Giambattista Bloisi fab9920271 DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag 2023-08-09 15:41:43 +02:00
Miriam Baglioni c25ac21e5e Merge pull request 'graph cleaning, suggestions from ticket 8898' (#325) from cleaning_8898 into beta
Reviewed-on: D-Net/dnet-hadoop#325
2023-08-08 11:14:19 +02:00
Miriam Baglioni c334fe2438 Merge pull request 'Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities' (#328) from cleanup_relations_after_dedup into beta
Reviewed-on: D-Net/dnet-hadoop#328
2023-08-08 09:49:12 +02:00
Miriam Baglioni 0e2f855807 Merge pull request 'Updates Promotion DBs' (#321) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#321
2023-08-07 12:09:16 +02:00
Miriam Baglioni 18fbe52b20 Merge pull request 'Import affiliation relations from Crossref' (#320) from 8876 into beta
Reviewed-on: D-Net/dnet-hadoop#320
2023-08-07 10:45:30 +02:00
Giambattista Bloisi 97b6d1dc45 Filter ids by dataInfo.deletedbyinference and DataInfo.invisible flags
Filter relations also by dataInfo.invisible flag
2023-08-07 10:24:11 +02:00
Giambattista Bloisi af49424b59 Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleyted by inference or that are pointing to dangling entities 2023-08-04 14:27:39 +02:00
Claudio Atzori 11ffb9bd68 rule out records with NULL dataInfo 2023-07-31 12:35:33 +02:00
Serafeim Chatzopoulos 7cefe2665b Remove unnecessary classes 2023-07-28 19:14:39 +03:00
Serafeim Chatzopoulos 26a92ce762 Merge branch '8876' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8876 2023-07-28 19:03:57 +03:00
Serafeim Chatzopoulos ebfba38ab6 Add changes from code review 2023-07-28 19:03:47 +03:00