For lazy people who hate to execute a Zeppelin notebook that doesn't work for unknown reasons!
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Sandro La Bruzzo 2b28aaeff3 removed useless import 1 year ago
src removed useless import 1 year ago
.gitignore removed useless import 1 year ago
README.md updated documentation on how to use 2 years ago
execute_notebook.py Added stuff 1 year ago
pom.xml Added stuff 1 year ago

README.md

DHP-Explorer

For lazy people who hate to execute a Zeppelin notebook that doesn't work for unknown reasons!

How it works?

Let's say you want to create a series of spark jobs to evaluate some features in your datasets. You have to use a Zeppelin notebook. Sometimes the notebook doesn't work well. You don't understand why you have such errors instead in spark-shell works.

with this new project your problem are done.

step 1

Create a Java/Scla main application where run your stuff

step 2

Run the python script

python execute_notebook.py {SSH USER NAME} {MAIN CLASS reference path} {arguments_file path}

the arguments_file is a file which contains all the arguments organized one for line.

This script does the following:

  • create the main jar using mvn package
  • upload the jar to the iss machine (iis-cdh5-test-gw.ocean.icm.edu.pl)
  • upload all the dependencies jar by checking in the pom those preceded by comment <!-- JAR NEED -->
  • submit the spark-job and you can watch the output log directly on your machine

Class convention

To create a new scala Class Spark, you need to extend the AbstractScalaApplication on package com.sandro.app Implements the method run where you have already the spark context initialized.

Then define a Singleton object NameOfTheClass and in the main methods you run the code:

new YOURCLASS(args,logger).initialize().run()

That's AWESOME!