bioschemas_datasource_key mobidb bioschemas datasource key (i.e. mobidb, ped, disprot) workingPath /data/bioschema/mobidb/ the working path rdfInput base64_gzipped_nquads.txt rdf output of scraping workflow output json-datacite/ profile Protein the input data profile that has to be used for conversion (https://bioschemas.org/profiles/) oozie.launcher.mapreduce.map.java.opts -Xmx4g spark2RdfConversionMaxExecutors 50 sparkDriverMemory 7G memory for driver process sparkExecutorMemory 2G memory for individual executor sparkExecutorCores 4 number of cores used by single executor spark2ExtraListeners com.cloudera.spark.lineage.NavigatorAppListener spark 2.* extra listeners classname spark2YarnHistoryServerAddress spark 2.* yarn history server address spark2EventLogDir spark 2.* event log dir location Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]

${jobTracker}

${nameNode}

download_nquads.sh ${bioschemas_datasource_key} download_nquads.sh

yarn-cluster cluster NquadsToDataciteJson eu.dnetlib.dhp.rdfconverter.bioschema.SparkRdfToDatacite dhp-rdfconverter-${projectVersion}.jar

--executor-cores=${sparkExecutorCores} --driver-memory=${sparkDriverMemory} --executor-memory=${sparkExecutorMemory} --driver-memory=${sparkDriverMemory} --conf spark.extraListeners=${spark2ExtraListeners} --conf spark.yarn.historyServer.address=${spark2YarnHistoryServerAddress} --conf spark.eventLog.dir=${nameNode}${spark2EventLogDir}

--nameNode${nameNode} --workingPath${workingPath} --rdfInput${rdfInput} --output${output} --profile${profile}