For lazy people who hate to execute a Zeppelin notebook that doesn't work for unknown reasons!

Go to file

Sandro La Bruzzo 2b28aaeff3 removed useless import		2023-04-12 10:12:36 +02:00
src	removed useless import	2023-04-12 10:12:36 +02:00
.gitignore	removed useless import	2023-04-12 10:12:36 +02:00
README.md	updated documentation on how to use	2022-10-21 11:12:24 +02:00
execute_notebook.py	Added stuff	2023-04-12 10:09:05 +02:00
pom.xml	Added stuff	2023-04-12 10:09:05 +02:00

README.md

DHP-Explorer

For lazy people who hate to execute a Zeppelin notebook that doesn't work for unknown reasons!

How it works?

Let's say you want to create a series of spark jobs to evaluate some features in your datasets. You have to use a Zeppelin notebook. Sometimes the notebook doesn't work well. You don't understand why you have such errors instead in spark-shell works.

with this new project your problem are done.

step 1

Create a Java/Scla main application where run your stuff

step 2

Run the python script

python execute_notebook.py {SSH USER NAME} {MAIN CLASS reference path} {arguments_file path}

the arguments_file is a file which contains all the arguments organized one for line.

This script does the following:

create the main jar using mvn package
upload the jar to the iss machine (iis-cdh5-test-gw.ocean.icm.edu.pl)
upload all the dependencies jar by checking in the pom those preceded by comment 
submit the spark-job and you can watch the output log directly on your machine

Class convention

To create a new scala Class Spark, you need to extend the AbstractScalaApplication on package com.sandro.app Implements the method run where you have already the spark context initialized.

Then define a Singleton object NameOfTheClass and in the main methods you run the code:

new YOURCLASS(args,logger).initialize().run()

That's AWESOME!