2022-10-18 15:05:09 +02:00
|
|
|
# DHP-Explorer
|
|
|
|
|
2022-10-18 16:24:34 +02:00
|
|
|
For lazy people who hate to execute a Zeppelin notebook that doesn't work for unknown reasons!
|
|
|
|
|
|
|
|
# How it works?
|
|
|
|
Let's say you want to create a series of spark jobs to evaluate some features in your datasets. You have to use a Zeppelin notebook.
|
|
|
|
Sometimes the notebook doesn't work well. You don't understand why you have such errors instead in spark-shell works.
|
|
|
|
|
|
|
|
with this new project your problem are done.
|
|
|
|
|
|
|
|
## step 1
|
|
|
|
Create a Java/Scla main application where run your stuff
|
|
|
|
|
|
|
|
## step 2
|
|
|
|
Run the python script
|
|
|
|
|
|
|
|
`python execute_notebook.py {SSH USER NAME} {MAIN CLASS reference path} {arguments_file path}`
|
|
|
|
|
|
|
|
the arguments_file is a file which contains all the arguments organized one for line.
|
|
|
|
|
|
|
|
This script does the following:
|
|
|
|
|
|
|
|
- create the main jar using mvn package
|
|
|
|
- upload the jar to the iss machine (_iis-cdh5-test-gw.ocean.icm.edu.pl_)
|
|
|
|
- upload all the dependencies jar by checking in the pom those preceded by comment `<!-- JAR NEED -->`
|
|
|
|
- submit the spark-job and you can watch the output log directly on your machine
|
|
|
|
|
2022-10-21 11:12:24 +02:00
|
|
|
|
|
|
|
## Class convention
|
|
|
|
|
|
|
|
To create a new scala Class Spark, you need to extend the _AbstractScalaApplication_ on package `com.sandro.app`
|
|
|
|
Implements the method `run` where you have already the spark context initialized.
|
|
|
|
|
|
|
|
Then define a Singleton object NameOfTheClass and in the main methods you run the code:
|
|
|
|
|
|
|
|
`new YOURCLASS(args,logger).initialize().run()`
|
|
|
|
|
|
|
|
|
2022-10-18 16:24:34 +02:00
|
|
|
That's AWESOME!
|