For lazy people who hate to execute a Zeppelin notebook that doesn't work for unknown reasons!
Go to file
Sandro La Bruzzo dc39761130 documented in the readme 2022-10-18 16:24:34 +02:00
src/main/java/eu/dnetlib/scholix added some example code 2022-10-18 15:08:06 +02:00
.gitignore added some example code 2022-10-18 15:08:06 +02:00
README.md documented in the readme 2022-10-18 16:24:34 +02:00
execute_notebook.py added some example code 2022-10-18 15:08:06 +02:00
pom.xml added some example code 2022-10-18 15:08:06 +02:00

README.md

DHP-Explorer

For lazy people who hate to execute a Zeppelin notebook that doesn't work for unknown reasons!

How it works?

Let's say you want to create a series of spark jobs to evaluate some features in your datasets. You have to use a Zeppelin notebook. Sometimes the notebook doesn't work well. You don't understand why you have such errors instead in spark-shell works.

with this new project your problem are done.

step 1

Create a Java/Scla main application where run your stuff

step 2

Run the python script

python execute_notebook.py {SSH USER NAME} {MAIN CLASS reference path} {arguments_file path}

the arguments_file is a file which contains all the arguments organized one for line.

This script does the following:

  • create the main jar using mvn package
  • upload the jar to the iss machine (iis-cdh5-test-gw.ocean.icm.edu.pl)
  • upload all the dependencies jar by checking in the pom those preceded by comment <!-- JAR NEED -->
  • submit the spark-job and you can watch the output log directly on your machine

That's AWESOME!