Introduce pre-flight checks in dhp workflows #71

Open
opened 3 years ago by claudio.atzori · 0 comments
Owner

The failures of the oozie workflows are quite common, may depend on factors that exulate from the business logics and aften involve settings (or misconfigurations) of the execution environment. Often these settings are overlooked by developers, which tend not to care much about them.

In this category, settings like the connection towards a given database, a certain REST API, or the write permissions on a certain directory play the role of pre-requisites for the correct execution of a workflow.

Many of the these workflow implement long-lasting actions that may express their dependency towards any of the upmentioned envornment configuration aspects at any stage, likely after hours (or days) spent processing or preparing data, impacting on the cluster availability for others and without completing its job. In short: a simple misconfiguration is plays the role of a dependency not being satisfied for the workflow implementation and leads to waste of precious time, implying tedious re-executions.

For this reason, it would be good if the oozie workflows could verify, whenever possible, that those pre-requisites are satisfied at the beginning of their execution.

I can imagine a set of utility classes defined in the dhp-common module, to be used to implement individual java actions for the verification of such pre-requisites.

Opinions? Ideas?

The failures of the oozie workflows are quite common, may depend on factors that exulate from the business logics and aften involve settings (or misconfigurations) of the execution environment. Often these settings are overlooked by developers, which tend not to care much about them. In this category, settings like the connection towards a given database, a certain REST API, or the write permissions on a certain directory play the role of pre-requisites for the correct execution of a workflow. Many of the these workflow implement long-lasting actions that may express their dependency towards any of the upmentioned envornment configuration aspects at any stage, likely after hours (or days) spent processing or preparing data, impacting on the cluster availability for others and without completing its job. In short: a simple misconfiguration is plays the role of a dependency not being satisfied for the workflow implementation and leads to waste of precious time, implying tedious re-executions. For this reason, it would be good if the oozie workflows could verify, whenever possible, that those **pre-requisites are satisfied at the beginning** of their execution. I can imagine a set of utility classes defined in the `dhp-common` module, to be used to implement individual java actions for the verification of such pre-requisites. Opinions? Ideas?
claudio.atzori added the
help wanted
enhancement
labels 3 years ago
sandro.labruzzo was assigned by claudio.atzori 3 years ago
claudio.atzori self-assigned this 3 years ago
michele.artini was assigned by claudio.atzori 3 years ago
miriam.baglioni was assigned by claudio.atzori 3 years ago
michele.debonis was assigned by claudio.atzori 3 years ago
enrico.ottonello was assigned by claudio.atzori 3 years ago
Sign in to join this conversation.
No Milestone
No project
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: D-Net/dnet-hadoop#71
Loading…
There is no content yet.