fosPath
the input path of the resources to be extended
bipScorePath
the path where to find the bipFinder scores
outputPath
the path where to store the actionset
sparkDriverMemory
memory for driver process
sparkExecutorMemory
memory for individual executor
sparkExecutorCores
number of cores used by single executor
oozieActionShareLibForSpark2
oozie action sharelib for spark 2.*
spark2ExtraListeners
com.cloudera.spark.lineage.NavigatorAppListener
spark 2.* extra listeners classname
spark2SqlQueryExecutionListeners
com.cloudera.spark.lineage.NavigatorQueryListener
spark 2.* sql query execution listeners classname
spark2YarnHistoryServerAddress
spark 2.* yarn history server address
spark2EventLogDir
spark 2.* event log dir location
${jobTracker}
${nameNode}
mapreduce.job.queuename
${queueName}
oozie.launcher.mapred.job.queue.name
${oozieLauncherQueueName}
oozie.action.sharelib.for.spark
${oozieActionShareLibForSpark2}
Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
yarn
cluster
Produces the unresolved from bip finder!
eu.dnetlib.dhp.actionmanager.createunresolvedentities.PrepareBipFinder
dhp-aggregation-${projectVersion}.jar
--executor-memory=${sparkExecutorMemory}
--executor-cores=${sparkExecutorCores}
--driver-memory=${sparkDriverMemory}
--conf spark.extraListeners=${spark2ExtraListeners}
--conf spark.sql.queryExecutionListeners=${spark2SqlQueryExecutionListeners}
--conf spark.yarn.historyServer.address=${spark2YarnHistoryServerAddress}
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
--sourcePath${bipScorePath}
--outputPath${workingDir}/prepared
eu.dnetlib.dhp.actionmanager.createunresolvedentities.GetInputData
--hdfsNameNode${nameNode}
--sourcePath${fosPath}
--outputPath${workingDir}/input/fos
--classForNameeu.dnetlib.dhp.actionmanager.createunresolvedentities.model.FOSDataModel
yarn
cluster
Produces the unresolved from FOS!
eu.dnetlib.dhp.actionmanager.createunresolvedentities.PrepareFOSSparkJob
dhp-aggregation-${projectVersion}.jar
--executor-memory=${sparkExecutorMemory}
--executor-cores=${sparkExecutorCores}
--driver-memory=${sparkDriverMemory}
--conf spark.extraListeners=${spark2ExtraListeners}
--conf spark.sql.queryExecutionListeners=${spark2SqlQueryExecutionListeners}
--conf spark.yarn.historyServer.address=${spark2YarnHistoryServerAddress}
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
--sourcePath${workingDir}/input/fos
--outputPath${workingDir}/prepared
eu.dnetlib.dhp.actionmanager.createunresolvedentities.GetInputData
--hdfsNameNode${nameNode}
--sourcePath${sdgPath}
--outputPath${workingDir}/input/sdg
--classForNameeu.dnetlib.dhp.actionmanager.createunresolvedentities.model.SDGDataModel
yarn
cluster
Produces the unresolved from FOS!
eu.dnetlib.dhp.actionmanager.createunresolvedentities.PrepareSDGSparkJob
dhp-aggregation-${projectVersion}.jar
--executor-memory=${sparkExecutorMemory}
--executor-cores=${sparkExecutorCores}
--driver-memory=${sparkDriverMemory}
--conf spark.extraListeners=${spark2ExtraListeners}
--conf spark.sql.queryExecutionListeners=${spark2SqlQueryExecutionListeners}
--conf spark.yarn.historyServer.address=${spark2YarnHistoryServerAddress}
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
--sourcePath${workingDir}/input/sdg
--outputPath${workingDir}/prepared
yarn
cluster
Saves the result produced for bip and fos by grouping results with the same id
eu.dnetlib.dhp.actionmanager.createunresolvedentities.SparkSaveUnresolved
dhp-aggregation-${projectVersion}.jar
--executor-memory=${sparkExecutorMemory}
--executor-cores=${sparkExecutorCores}
--driver-memory=${sparkDriverMemory}
--conf spark.extraListeners=${spark2ExtraListeners}
--conf spark.sql.queryExecutionListeners=${spark2SqlQueryExecutionListeners}
--conf spark.yarn.historyServer.address=${spark2YarnHistoryServerAddress}
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--conf spark.sql.warehouse.dir=${sparkSqlWarehouseDir}
--sourcePath${workingDir}/prepared
--outputPath${outputPath}