Closes #4: New action manager implementation #5
No reviewers
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#5
Loading…
Reference in New Issue
No description provided.
Delete Branch ":przemyslawjacewicz_actionmanager_impl_prototype"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Overview
This PR adds a new module with action manager implementation.
It contains a single Oozie workflow for promotion of action sets supplied as a comma separated list of HDFS locations containing hadoop sequence files with
AtomicAction
as the value.Additions
equals
andhashCode
for classes that did not overwrite theses methodsmergeFrom
, avoidingClassCastException
when target instance cannot be casted to source instancemergeFrom
forRelation
safeguarded against null value ofcollectedfrom
fieldA major drawback is the lack of tests for Oozie workflows. These tests would be possible after introduction of a framework for testing oozie workflow in the parent module.
Overall the code is well organized and the testing suite is also fine. One aspect still being discussed is the encoding used to store the
AtomicActions
, depending on what we're going to decide, the ActionSets read procedure might need to be adjusted a bit (https://code-repo.d4science.org/przemyslaw.jacewicz/dnet-hadoop/src/branch/przemyslawjacewicz_actionmanager_impl_prototype/dhp-workflows/dhp-actionmanager/src/main/java/eu/dnetlib/dhp/actionmanager/partition/PartitionActionSetsByPayloadTypeJob.java#L103), but this is something I can take care of.Ref. ticket https://issue.openaire.research-infrastructures.eu/issues/5507
Thanks for these!
The helper classes defined in the
eu.dnetlib.dhp.actionmanager.common package are well suited to be part of the dhp-common module. I'm probably going to move them there, under
eu.dnetlib.dhp.common`.I agree. As more workflows are being defined, we need a framework to test them. Your experience with IIS workflows testing will be precious :)
@ -0,0 +26,4 @@
public static void runWithSparkSession(SparkConf conf,
Boolean isSparkSessionManaged,
ThrowingConsumer<SparkSession, Exception> fn) {
runWithSparkSession(c -> SparkSession.builder().config(c).getOrCreate(), conf, isSparkSessionManaged, fn);
Something I realized today while refactoring: what about those spark actions that need to have hive support enabled?
SparkSession.Builder
has a nice.enableHiveSupport()
to activate it, should we make it always enabled? Otherwise I can define a specializedrunWithSparkSession
.Hmm, I didn't think about that, the methods
runWithSparkSession
were to be used by action manager's spark actions, where Hive support is not needed. But I think that the version ofrunWithSparkSession
that acceptsSparkSession
builder function could be used to supply a builder that builds aSparkSession
with Hive support using code like thisPS. Sorry for the late reply, didn't see the comment earlier.
Just saw that you already did that :)