New action manager implementation #4
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Overview
Add new action manager module for promoting action sets. New module should take as input materialized graph and a list of action sets and merge action payload with compatibile graph tables producing a new graph materialization. Both graph materializations should be consistent with graph representation here.
Input
mergeFrom
method of OAF model and select newer instanceOutput
Oozie workflows implementation notes
If possible use subworkflows and divide processing into separate actions to avoid memory and performance issues for jobs running on the cluster.
Spark jobs implementation notes
Use Spark Dataset/Dataframe API.
Spark jobs development notes
Container killed by YARN
For large datasets e.g. publications default number of partitions after shuffling (equal to 200) can cause memory problems and job failing with the following message
This easiest way to resolve this problem is to increase the number of partitions after shuffling using this spark conf param:
The number should be 3 or 4 times the number of cores in the cluster.
OutOfMemoryError
It seems that hierarchical OAF model is not fully compatible with spark Java bean encoder accessible through
Encoders.bean
and an OOM error can be produced when grouping and aggregating:The solution is to use kryo encoder accessible using
Encoders.kryo
. This has a drawback: when using kryo serialization dataset schema (based on the schema of objects held by the dataset) is lost and the dataset has only one columnvalue
with byte representation of objects. This means that spark SQL-based API cannot be used and only spark dataset API must be used.NotSerializableException
When creating utility classes with methods for processing spark
Dataset
s taking lambdas as inputjava.io.NotSerializableException
can be produced; this is caused by the fact thatjava.util.function
interfaces are not serializable.The solution is to wrap the lambda in a serializable wrapper.
Failed tasks
When running spark jobs on a busy cluster executors can be killed by the driver if an executor is idle for more than
spark.dynamicAllocation.executorIdleTimeout
seconds (default to 60s); when this happens spark executor logs contain this messageand spark driver logs contain this message
This is a normal situation and failed executors should be recreated by spark driver.