Commit Graph

41 Commits

Author SHA1 Message Date
Giambattista Bloisi aae37058f7 Increase memory 2024-10-21 20:30:31 +02:00
Giambattista Bloisi 131f6e5592 enable dynamic allocation 2024-10-21 20:23:31 +02:00
Giambattista Bloisi df46c8c65f Added ORCID enrichment workflows 2024-10-21 18:55:41 +02:00
Giambattista Bloisi 034a01542a Implement consistency workflow 2024-10-21 15:33:55 +02:00
Giambattista Bloisi c6fbfd3f0a Remove numpartitions argument where not needed 2024-10-21 14:30:40 +02:00
Giambattista Bloisi ae89274ce4 implemente whole scan pipeline 2024-10-21 14:10:28 +02:00
Giambattista Bloisi fa90a9dbe0 Update spark operator version 2024-10-21 09:36:55 +02:00
Giambattista Bloisi 0a2956d81f reduce executor cores 2024-10-19 17:59:35 +02:00
Giambattista Bloisi 48f688cda9 add deps jar 2024-10-19 11:13:21 +02:00
Giambattista Bloisi c5f4263061 update spark-version 2024-10-19 11:09:24 +02:00
Giambattista Bloisi ba3f351736 print existing files 2024-10-19 10:26:18 +02:00
Giambattista Bloisi 448bb924ab add test dedup task 2024-10-19 00:18:00 +02:00
Giambattista Bloisi bf7c9e2dce revert some changes 2024-10-18 17:16:37 +02:00
Giambattista Bloisi 8da265f018 add utils in the parent folder 2024-10-18 17:00:51 +02:00
Giambattista Bloisi 0fcabed2ae change dag name 2024-10-18 16:58:42 +02:00
Giambattista Bloisi c3ba29e4c5 Add dagutils 2024-10-18 16:53:14 +02:00
Giambattista Bloisi 412e008df7 Add untar task 2024-10-18 16:42:54 +02:00
Sandro La Bruzzo df6e23666e fix 2024-10-16 16:35:01 +02:00
Sandro La Bruzzo d1afcd4395 fixed import 2024-10-16 14:08:00 +02:00
Sandro La Bruzzo dcd2efd3b4 added workflow test 2024-10-16 13:56:50 +02:00
Sandro La Bruzzo 6b555b8f6e added workflow test 2024-10-16 13:56:36 +02:00
Sandro La Bruzzo b8bf21f8e5 fixed import 2024-10-16 13:51:49 +02:00
Sandro La Bruzzo 07ce192207 added workflow test 2024-10-16 13:38:26 +02:00
Sandro La Bruzzo ed3422673f added git variable for airflow module 2024-10-16 13:35:49 +02:00
Sandro La Bruzzo 35c44845d2 added creation of bucket using variables 2024-10-16 12:07:14 +02:00
Sandro La Bruzzo 85cf6eeb1a fixed bucket creation updated tenant yaml 2024-10-16 10:45:48 +02:00
Giambattista Bloisi 7528675590 update version of minio 2024-10-16 09:08:14 +02:00
Claudio Atzori 02a15472d4 Update README.md 2024-05-16 12:30:38 +02:00
Claudio Atzori 8c8862be36 Update README.md 2024-05-16 12:30:04 +02:00
Sandro La Bruzzo 8dff6727ef minor fix 2024-05-02 11:51:56 +02:00
Sandro La Bruzzo c9c80ad9b9 updated airflow roles 2024-05-02 11:31:17 +02:00
Sandro La Bruzzo 196ba0b54a removed depenedency to costum spark-operator 2024-05-02 10:33:21 +02:00
Sandro La Bruzzo 32e8e86aa7 added spark driver image 2024-05-01 16:35:21 +02:00
Sandro La Bruzzo 0863c9b2e9 Added airflow to modules 2024-04-22 12:29:21 +02:00
Giambattista Bloisi 7a8a2e6285 - Update spark-operator version and image to support s3
- Change default code repo for Dag airflow to this repository, branch airflow, folder "airflow/dags"
2024-04-22 10:20:20 +02:00
Sandro La Bruzzo 9631d0245b Improved documentation 2024-04-19 16:32:51 +02:00
Sandro La Bruzzo b51fd066e4 added setup for generating cluster with minio 2024-04-19 15:54:18 +02:00
Sandro La Bruzzo cb1f95d82a added setup for generating cluster with minio 2024-04-19 15:53:45 +02:00
Sandro La Bruzzo eb18526071 added ignore terraform 2024-04-19 15:32:07 +02:00
Sandro La Bruzzo e03ec0c27d first import 2024-04-19 15:25:44 +02:00
Sandro La Bruzzo ca3ce0cdcc first commit 2024-04-16 13:48:46 +02:00