bioschemas_datasource_key
mobidb
bioschemas datasource key (i.e. mobidb, ped, disprot)
workingPath
/data/bioschema/mobidb/
the working path
rdfInput
base64_gzipped_nquads.txt
rdf output of scraping workflow
output
json-datacite/
profile
Protein
the input data profile that has to be used for conversion (https://bioschemas.org/profiles/)
oozie.launcher.mapreduce.map.java.opts
-Xmx4g
spark2RdfConversionMaxExecutors
50
sparkDriverMemory
7G
memory for driver process
sparkExecutorMemory
2G
memory for individual executor
sparkExecutorCores
4
number of cores used by single executor
spark2ExtraListeners
com.cloudera.spark.lineage.NavigatorAppListener
spark 2.* extra listeners classname
spark2YarnHistoryServerAddress
spark 2.* yarn history server address
spark2EventLogDir
spark 2.* event log dir location
Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
${jobTracker}
${nameNode}
download_nquads.sh
${bioschemas_datasource_key}
download_nquads.sh
yarn-cluster
cluster
NquadsToDataciteJson
eu.dnetlib.dhp.rdfconverter.bioschema.SparkRdfToDatacite
dhp-rdfconverter-${projectVersion}.jar
--executor-cores=${sparkExecutorCores}
--driver-memory=${sparkDriverMemory}
--executor-memory=${sparkExecutorMemory}
--driver-memory=${sparkDriverMemory}
--conf spark.extraListeners=${spark2ExtraListeners}
--conf spark.yarn.historyServer.address=${spark2YarnHistoryServerAddress}
--conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}
--nameNode${nameNode}
--workingPath${workingPath}
--rdfInput${rdfInput}
--output${output}
--profile${profile}