Introduce pre-flight checks in dhp workflows #71

Open
opened 2020-12-10 10:02:25 +01:00 by claudio.atzori · 0 comments

The failures of the oozie workflows are quite common, may depend on factors that exulate from the business logics and aften involve settings (or misconfigurations) of the execution environment. Often these settings are overlooked by developers, which tend not to care much about them.

In this category, settings like the connection towards a given database, a certain REST API, or the write permissions on a certain directory play the role of pre-requisites for the correct execution of a workflow.

Many of the these workflow implement long-lasting actions that may express their dependency towards any of the upmentioned envornment configuration aspects at any stage, likely after hours (or days) spent processing or preparing data, impacting on the cluster availability for others and without completing its job. In short: a simple misconfiguration is plays the role of a dependency not being satisfied for the workflow implementation and leads to waste of precious time, implying tedious re-executions.

For this reason, it would be good if the oozie workflows could verify, whenever possible, that those pre-requisites are satisfied at the beginning of their execution.

I can imagine a set of utility classes defined in the dhp-common module, to be used to implement individual java actions for the verification of such pre-requisites.

Opinions? Ideas?

The failures of the oozie workflows are quite common, may depend on factors that exulate from the business logics and aften involve settings (or misconfigurations) of the execution environment. Often these settings are overlooked by developers, which tend not to care much about them. In this category, settings like the connection towards a given database, a certain REST API, or the write permissions on a certain directory play the role of pre-requisites for the correct execution of a workflow. Many of the these workflow implement long-lasting actions that may express their dependency towards any of the upmentioned envornment configuration aspects at any stage, likely after hours (or days) spent processing or preparing data, impacting on the cluster availability for others and without completing its job. In short: a simple misconfiguration is plays the role of a dependency not being satisfied for the workflow implementation and leads to waste of precious time, implying tedious re-executions. For this reason, it would be good if the oozie workflows could verify, whenever possible, that those **pre-requisites are satisfied at the beginning** of their execution. I can imagine a set of utility classes defined in the `dhp-common` module, to be used to implement individual java actions for the verification of such pre-requisites. Opinions? Ideas?
claudio.atzori added the
help wanted
enhancement
labels 2020-12-10 10:02:25 +01:00
sandro.labruzzo was assigned by claudio.atzori 2020-12-10 10:02:25 +01:00
claudio.atzori self-assigned this 2020-12-10 10:02:25 +01:00
michele.artini was assigned by claudio.atzori 2020-12-10 10:02:25 +01:00
miriam.baglioni was assigned by claudio.atzori 2020-12-10 10:02:25 +01:00
michele.debonis was assigned by claudio.atzori 2020-12-10 10:02:25 +01:00
enrico.ottonello was assigned by claudio.atzori 2020-12-10 10:02:26 +01:00
Sign in to join this conversation.
No Milestone
No project
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#71
No description provided.