workingPath /data/bioschema/ped/ the working path sitemapUrl https://proteinensemble.org/sitemap2.xml.gz sitemapURLKey loc dynamic true the dynamic boolean determines if the scraper should start using selenium or JSOUP to scrape the information (dynamic and static respectively) maxScrapedPages 10 max number of pages that will be scraped, default: no limit rdfOutput nquads.seq rdf output of scraping step scraping_java_opts -Xmx4g -Dwebdriver.chrome.whitelistedIps= Used to configure the heap size for the map JVM process. Should be 80% of mapreduce.map.memory.mb.

${jobTracker}

${nameNode}

Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]

${jobTracker}

${nameNode}

oozie.launcher.mapreduce.user.classpath.first true

eu.dnetlib.dhp.bmuse.bioschema.ScrapingJob

${scraping_java_opts}

--nameNode${nameNode} --workingPath${workingPath} --rdfOutput${rdfOutput} --sitemapUrl${sitemapUrl} --sitemapURLKey${sitemapURLKey} --dynamic${dynamic}