sourcePath
the raw graph base path
entity
the entity that should be processed
dedupConf
the (list of) dedup Configuration(s)
targetPath
the output base path
rawSet
the output directory in the targetPath
agentId
the agent identifier
agentName
the agent name
sparkDriverMemory
memory for driver process
sparkExecutorMemory
memory for individual executor
sparkExecutorCores
number of cores used by single executor
Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
${jobTracker}
${nameNode}
yarn-cluster
cluster
Create Similarity Relations
eu.dnetlib.dedup.SparkCreateSimRels2
dhp-dedup-${projectVersion}.jar
--executor-memory ${sparkExecutorMemory} --executor-cores ${sparkExecutorCores}
--driver-memory=${sparkDriverMemory} --conf
spark.extraListeners="com.cloudera.spark.lineage.NavigatorAppListener" --conf
spark.sql.queryExecutionListeners="com.cloudera.spark.lineage.NavigatorQueryListener" --conf
spark.sql.warehouse.dir="/user/hive/warehouse"
-mtyarn-cluster
--sourcePath${sourcePath}
--targetPath${targetPath}
--entity${entity}
--dedupConf${dedupConf}
--rawSet${rawSet}
--agentId${agentId}
--agentName${agentName}