workingPath
/data/bioschema/ped/
the working path
sitemapUrl
https://proteinensemble.org/sitemap2.xml.gz
sitemapURLKey
loc
dynamic
true
the dynamic boolean determines if the scraper should start using selenium or JSOUP to scrape the information (dynamic and static respectively)
maxScrapedPages
10
max number of pages that will be scraped, default: no limit
rdfOutput
nquads.seq
rdf output of scraping step
scraping_java_opts
-Xmx4g -Dwebdriver.chrome.whitelistedIps=
Used to configure the heap size for the map JVM process. Should be 80% of mapreduce.map.memory.mb.
${jobTracker}
${nameNode}
Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
${jobTracker}
${nameNode}
oozie.launcher.mapreduce.user.classpath.first
true
eu.dnetlib.dhp.bmuse.bioschema.ScrapingJob
${scraping_java_opts}
--nameNode${nameNode}
--workingPath${workingPath}
--rdfOutput${rdfOutput}
--sitemapUrl${sitemapUrl}
--sitemapURLKey${sitemapURLKey}
--dynamic${dynamic}