diff --git a/README.md b/README.md index 1c5dadc..c21173c 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,28 @@ # DHP-Explorer -For lazy people who hate to execute a Zeppelin notebook that doesn't work for unknown reasons! \ No newline at end of file +For lazy people who hate to execute a Zeppelin notebook that doesn't work for unknown reasons! + +# How it works? +Let's say you want to create a series of spark jobs to evaluate some features in your datasets. You have to use a Zeppelin notebook. +Sometimes the notebook doesn't work well. You don't understand why you have such errors instead in spark-shell works. + +with this new project your problem are done. + +## step 1 +Create a Java/Scla main application where run your stuff + +## step 2 +Run the python script + +`python execute_notebook.py {SSH USER NAME} {MAIN CLASS reference path} {arguments_file path}` + +the arguments_file is a file which contains all the arguments organized one for line. + +This script does the following: + +- create the main jar using mvn package +- upload the jar to the iss machine (_iis-cdh5-test-gw.ocean.icm.edu.pl_) +- upload all the dependencies jar by checking in the pom those preceded by comment `` +- submit the spark-job and you can watch the output log directly on your machine + +That's AWESOME! \ No newline at end of file