sourcePath
the Working Path
workingPath
the Working Path
targetPath
the OAF MDStore Path
sparkDriverMemory
memory for driver process
sparkExecutorMemory
memory for individual executor
sparkExecutorCores
number of cores used by single executor
resumeFrom
DownloadEBILinks
node to start
${wf:conf('resumeFrom') eq 'DownloadEBILinks'}
${wf:conf('resumeFrom') eq 'CreateEBIDataSet'}
Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
yarn-cluster
cluster
Incremental Download EBI Links
eu.dnetlib.dhp.sx.bio.ebi.SparkDownloadEBILinks
dhp-aggregation-${projectVersion}.jar
--executor-memory=${sparkExecutorMemory}
--executor-cores=${sparkExecutorCores}
--driver-memory=${sparkDriverMemory}
--conf spark.extraListeners=${spark2ExtraListeners}
--conf spark.sql.shuffle.partitions=2000
--conf spark.sql.queryExecutionListeners=${spark2SqlQueryExecutionListeners}
--conf spark.yarn.historyServer.address=${spark2YarnHistoryServerAddress}
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--sourcePath${sourcePath}
--workingPath${workingPath}
--masteryarn
yarn-cluster
cluster
Create OAF DataSet
eu.dnetlib.dhp.sx.bio.ebi.SparkEBILinksToOaf
dhp-aggregation-${projectVersion}.jar
--executor-memory=${sparkExecutorMemory}
--executor-cores=${sparkExecutorCores}
--driver-memory=${sparkDriverMemory}
--conf spark.sql.shuffle.partitions=2000
${sparkExtraOPT}
--sourcePath${sourcePath}/ebi_links_dataset
--targetPath${targetPath}
--masteryarn