InfoSpace Deduplication using Spark InfoSpace Deduplication using Spark InfoSpace Deduplication 35 executeOozieJobICM /user/sandro.labruzzo/scholix/ IIS true true true true true dedup-dli-dataset d1e24272-939d-4216-ad58-22abe90b7fb4_RGVkdXBDb25maWd1cmF0aW9uRFNSZXNvdXJjZXMvRGVkdXBDb25maWd1cmF0aW9uRFNSZXNvdXJjZVR5cGU= dedup-dli-unknown import PMF Publications to HDFS DIR Run M/R import Job import PMF Publications to HDFS DIR Run M/R import Job import PMF Publications to HDFS DIR Run M/R import Job import PMF Publications to HDFS DIR Run M/R import Job Run M/R import Job Run M/R import Job Run M/R import Job Run M/R import Job import PMF Publications to HDFS DIR 29 5 22 ? * * 10080 wf_20200311_132512_626 2020-03-11T13:50:54+00:00 FAILURE eu.dnetlib.rmi.data.hadoop.HadoopServiceException: hadoop job: 0004121-190920055838013-oozie-oozi-W failed with status: KILLED, oozie log: 2020-03-11 13:38:02,044 INFO org.apache.oozie.service.JPAService: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[] No results found 2020-03-11 13:38:02,095 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@:start:] Start action [0004121-190920055838013-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2020-03-11 13:38:02,119 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@:start:] [***0004121-190920055838013-oozie-oozi-W@:start:***]Action status=DONE 2020-03-11 13:38:02,119 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@:start:] [***0004121-190920055838013-oozie-oozi-W@:start:***]Action updated in DB! 2020-03-11 13:38:02,241 INFO org.apache.oozie.service.JPAService: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@:start:] No results found 2020-03-11 13:38:02,307 INFO org.apache.oozie.command.wf.WorkflowNotificationXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0004121-190920055838013-oozie-oozi-W@:start: 2020-03-11 13:38:02,307 INFO org.apache.oozie.command.wf.WorkflowNotificationXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0004121-190920055838013-oozie-oozi-W 2020-03-11 13:38:02,370 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@DeleteTargetPath] Start action [0004121-190920055838013-oozie-oozi-W@DeleteTargetPath] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2020-03-11 13:38:02,444 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@DeleteTargetPath] [***0004121-190920055838013-oozie-oozi-W@DeleteTargetPath***]Action status=DONE 2020-03-11 13:38:02,474 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@DeleteTargetPath] [***0004121-190920055838013-oozie-oozi-W@DeleteTargetPath***]Action updated in DB! 2020-03-11 13:38:02,595 INFO org.apache.oozie.service.JPAService: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@DeleteTargetPath] No results found 2020-03-11 13:38:02,707 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] Start action [0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2020-03-11 13:38:05,274 INFO org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] checking action, hadoop job ID [job_1568959071843_15753] status [RUNNING] 2020-03-11 13:38:05,295 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] [***0004121-190920055838013-oozie-oozi-W@MergeDLIEntities***]Action status=RUNNING 2020-03-11 13:38:05,295 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] [***0004121-190920055838013-oozie-oozi-W@MergeDLIEntities***]Action updated in DB! 2020-03-11 13:38:05,344 INFO org.apache.oozie.command.wf.WorkflowNotificationXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] No Notification URL is defined. Therefore nothing to notify for job 0004121-190920055838013-oozie-oozi-W@MergeDLIEntities 2020-03-11 13:38:05,355 INFO org.apache.oozie.command.wf.WorkflowNotificationXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@DeleteTargetPath] No Notification URL is defined. Therefore nothing to notify for job 0004121-190920055838013-oozie-oozi-W@DeleteTargetPath 2020-03-11 13:48:07,901 INFO org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] checking action, hadoop job ID [job_1568959071843_15753] status [RUNNING] 2020-03-11 13:50:50,514 INFO org.apache.oozie.servlet.CallbackServlet: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] callback for action [0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] 2020-03-11 13:50:50,922 INFO org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] Hadoop Jobs launched : [job_1568959071843_15754] 2020-03-11 13:50:50,952 INFO org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] action completed, external ID [job_1568959071843_15753] 2020-03-11 13:50:50,973 WARN org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Application application_1568959071843_15754 finished with failed status 2020-03-11 13:50:50,995 WARN org.apache.oozie.action.hadoop.SparkActionExecutor: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] Launcher exception: Application application_1568959071843_15754 finished with failed status org.apache.spark.SparkException: Application application_1568959071843_15754 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1171) at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1608) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:178) at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:90) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:81) at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:57) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:235) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 2020-03-11 13:50:51,041 INFO org.apache.oozie.command.wf.ActionEndXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] ERROR is considered as FAILED for SLA 2020-03-11 13:50:51,094 INFO org.apache.oozie.service.JPAService: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] No results found 2020-03-11 13:50:51,115 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@Kill] Start action [0004121-190920055838013-oozie-oozi-W@Kill] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2020-03-11 13:50:51,116 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@Kill] [***0004121-190920055838013-oozie-oozi-W@Kill***]Action status=DONE 2020-03-11 13:50:51,116 INFO org.apache.oozie.command.wf.ActionStartXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[sandro.labruzzo] GROUP[-] TOKEN[] APP[Infospace Merge Entities] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@Kill] [***0004121-190920055838013-oozie-oozi-W@Kill***]Action updated in DB! 2020-03-11 13:50:51,273 INFO org.apache.oozie.command.wf.WorkflowNotificationXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@Kill] No Notification URL is defined. Therefore nothing to notify for job 0004121-190920055838013-oozie-oozi-W@Kill 2020-03-11 13:50:51,303 INFO org.apache.oozie.command.wf.WorkflowNotificationXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0004121-190920055838013-oozie-oozi-W 2020-03-11 13:50:51,277 INFO org.apache.oozie.command.wf.WorkflowNotificationXCommand: SERVER[iis-cdh5-test-m3.ocean.icm.edu.pl] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0004121-190920055838013-oozie-oozi-W] ACTION[0004121-190920055838013-oozie-oozi-W@MergeDLIEntities] No Notification URL is defined. Therefore nothing to notify for job 0004121-190920055838013-oozie-oozi-W@MergeDLIEntities