sourcePath
the working dir base path
targetPath
the graph Raw base path
Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
yarn
cluster
Extract entities in raw graph
eu.dnetlib.dhp.sx.graph.SparkCreateInputGraph
dhp-graph-mapper-${projectVersion}.jar
--executor-memory=${sparkExecutorMemory}
--executor-cores=${sparkExecutorCores}
--driver-memory=${sparkDriverMemory}
--conf spark.extraListeners=${spark2ExtraListeners}
--conf spark.sql.shuffle.partitions=2000
--conf spark.sql.queryExecutionListeners=${spark2SqlQueryExecutionListeners}
--conf spark.yarn.historyServer.address=${spark2YarnHistoryServerAddress}
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--masteryarn
--sourcePath${sourcePath}
--targetPath${targetPath}
yarn
cluster
Generate Input Graph for deduplication
eu.dnetlib.dhp.sx.graph.SparkConvertDatasetToJsonRDD
dhp-graph-mapper-${projectVersion}.jar
--executor-memory=${sparkExecutorMemory}
--executor-cores=${sparkExecutorCores}
--driver-memory=${sparkDriverMemory}
--conf spark.extraListeners=${spark2ExtraListeners}
--conf spark.sql.shuffle.partitions=3000
--conf spark.sql.queryExecutionListeners=${spark2SqlQueryExecutionListeners}
--conf spark.yarn.historyServer.address=${spark2YarnHistoryServerAddress}
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--masteryarn
--sourcePath${targetPath}/preprocess
--targetPath${targetPath}/dedup