For lazy people who hate to execute a Zeppelin notebook that doesn't work for unknown reasons!
Go to file
Sandro La Bruzzo 2b28aaeff3 removed useless import 2023-04-12 10:12:36 +02:00
src removed useless import 2023-04-12 10:12:36 +02:00
.gitignore removed useless import 2023-04-12 10:12:36 +02:00
README.md updated documentation on how to use 2022-10-21 11:12:24 +02:00
execute_notebook.py Added stuff 2023-04-12 10:09:05 +02:00
pom.xml Added stuff 2023-04-12 10:09:05 +02:00

README.md

DHP-Explorer

For lazy people who hate to execute a Zeppelin notebook that doesn't work for unknown reasons!

How it works?

Let's say you want to create a series of spark jobs to evaluate some features in your datasets. You have to use a Zeppelin notebook. Sometimes the notebook doesn't work well. You don't understand why you have such errors instead in spark-shell works.

with this new project your problem are done.

step 1

Create a Java/Scla main application where run your stuff

step 2

Run the python script

python execute_notebook.py {SSH USER NAME} {MAIN CLASS reference path} {arguments_file path}

the arguments_file is a file which contains all the arguments organized one for line.

This script does the following:

  • create the main jar using mvn package
  • upload the jar to the iss machine (iis-cdh5-test-gw.ocean.icm.edu.pl)
  • upload all the dependencies jar by checking in the pom those preceded by comment <!-- JAR NEED -->
  • submit the spark-job and you can watch the output log directly on your machine

Class convention

To create a new scala Class Spark, you need to extend the AbstractScalaApplication on package com.sandro.app Implements the method run where you have already the spark context initialized.

Then define a Singleton object NameOfTheClass and in the main methods you run the code:

new YOURCLASS(args,logger).initialize().run()

That's AWESOME!