2b28aaeff3 | ||
---|---|---|
src | ||
.gitignore | ||
README.md | ||
execute_notebook.py | ||
pom.xml |
README.md
DHP-Explorer
For lazy people who hate to execute a Zeppelin notebook that doesn't work for unknown reasons!
How it works?
Let's say you want to create a series of spark jobs to evaluate some features in your datasets. You have to use a Zeppelin notebook. Sometimes the notebook doesn't work well. You don't understand why you have such errors instead in spark-shell works.
with this new project your problem are done.
step 1
Create a Java/Scla main application where run your stuff
step 2
Run the python script
python execute_notebook.py {SSH USER NAME} {MAIN CLASS reference path} {arguments_file path}
the arguments_file is a file which contains all the arguments organized one for line.
This script does the following:
- create the main jar using mvn package
- upload the jar to the iss machine (iis-cdh5-test-gw.ocean.icm.edu.pl)
- upload all the dependencies jar by checking in the pom those preceded by comment
<!-- JAR NEED -->
- submit the spark-job and you can watch the output log directly on your machine
Class convention
To create a new scala Class Spark, you need to extend the AbstractScalaApplication on package com.sandro.app
Implements the method run
where you have already the spark context initialized.
Then define a Singleton object NameOfTheClass and in the main methods you run the code:
new YOURCLASS(args,logger).initialize().run()
That's AWESOME!