graphInputPath
the path where the graph is stored
outputDir
the path where the the generated data are stored
datasourceIdWhitelist
-
a white list (comma separeted, - for empty list) of datasource ids
datasourceTypeWhitelist
-
a white list (comma separeted, - for empty list) of datasource types
datasourceIdBlacklist
-
a black list (comma separeted, - for empty list) of datasource ids
esEventIndexName
the elasticsearch index name for events
esNotificationsIndexName
the elasticsearch index name for notifications
esIndexHost
the elasticsearch host
esBatchWriteRetryCount
8
an ES configuration property
esBatchWriteRetryWait
60s
an ES configuration property
esBatchSizeEntries
200
an ES configuration property
esNodesWanOnly
true
an ES configuration property
maxIndexedEventsForDsAndTopic
the max number of events for each couple (ds/topic)
brokerApiBaseUrl
the url of the broker service api
sparkDriverMemory
memory for driver process
sparkExecutorMemory
memory for individual executor
sparkExecutorCores
number of cores used by single executor
oozieActionShareLibForSpark2
oozie action sharelib for spark 2.*
spark2ExtraListeners
com.cloudera.spark.lineage.NavigatorAppListener
spark 2.* extra listeners classname
spark2SqlQueryExecutionListeners
com.cloudera.spark.lineage.NavigatorQueryListener
spark 2.* sql query execution listeners classname
spark2YarnHistoryServerAddress
spark 2.* yarn history server address
spark2EventLogDir
spark 2.* event log dir location
sparkMaxExecutorsForIndexing
8
Max number of workers for ElasticSearch indexing
${jobTracker}
${nameNode}
mapreduce.job.queuename
${queueName}
oozie.launcher.mapred.job.queue.name
${oozieLauncherQueueName}
oozie.action.sharelib.for.spark
${oozieActionShareLibForSpark2}
Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
yarn
cluster
GenerateNotificationsJob
eu.dnetlib.dhp.broker.oa.GenerateNotificationsJob
dhp-broker-events-${projectVersion}.jar
--executor-cores=${sparkExecutorCores}
--executor-memory=${sparkExecutorMemory}
--driver-memory=${sparkDriverMemory}
--conf spark.extraListeners=${spark2ExtraListeners}
--conf spark.sql.queryExecutionListeners=${spark2SqlQueryExecutionListeners}
--conf spark.yarn.historyServer.address=${spark2YarnHistoryServerAddress}
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--conf spark.sql.shuffle.partitions=3840
--outputDir${outputDir}
--brokerApiBaseUrl${brokerApiBaseUrl}
yarn
cluster
IndexNotificationsOnESJob
eu.dnetlib.dhp.broker.oa.IndexNotificationsJob
dhp-broker-events-${projectVersion}.jar
--executor-memory=${sparkExecutorMemory}
--driver-memory=${sparkDriverMemory}
--conf spark.dynamicAllocation.maxExecutors=${sparkMaxExecutorsForIndexing}
--conf spark.extraListeners=${spark2ExtraListeners}
--conf spark.sql.queryExecutionListeners=${spark2SqlQueryExecutionListeners}
--conf spark.yarn.historyServer.address=${spark2YarnHistoryServerAddress}
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--conf spark.sql.shuffle.partitions=3840
--outputDir${outputDir}
--index${esNotificationsIndexName}
--esHost${esIndexHost}
--esBatchWriteRetryCount${esBatchWriteRetryCount}
--esBatchWriteRetryWait${esBatchWriteRetryWait}
--esBatchSizeEntries${esBatchSizeEntries}
--esNodesWanOnly${esNodesWanOnly}
--brokerApiBaseUrl${brokerApiBaseUrl}