BrBETA_dnet-hadoop/dhp-workflows/dhp-impact-indicators
Claudio Atzori e96c2c1606 [ranking wf] set spark.executor.memoryOverhead to fine tune the resource consumption 2024-04-30 16:23:25 +02:00
..
src/main/resources/eu/dnetlib/dhp/oa/graph/impact_indicators [ranking wf] set spark.executor.memoryOverhead to fine tune the resource consumption 2024-04-30 16:23:25 +02:00
README.md Fix workflow application path 2023-05-16 16:28:48 +03:00
pom.xml Add dependency to dhp-aggregation 2023-03-21 19:25:29 +02:00

README.md

Ranking Workflow for OpenAIRE Publications

This project contains the files for running a paper ranking workflow on the openaire graph using apache oozie. All scripts are written in python and the project setup follows the typical oozie workflow structure:

  • a workflow.xml file containing the workflow specification
  • a job.properties file specifying parameter values for the parameters used by the workflow
  • a set of python scripts used by the workflow

NOTE: the workflow depends on the external library of ranking scripts called BiP! Ranker. You can check out a specific tag/release of BIP! Ranker using maven, as described in the following section.

Build and deploy

Use the following command for packaging:

mvn package -Poozie-package -Dworkflow.source.dir=eu/dnetlib/dhp/oa/graph/impact_indicators -DskipTests

Deploy and run:

mvn package -Poozie-package,deploy,run -Dworkflow.source.dir=eu/dnetlib/dhp/oa/graph/impact_indicators -DskipTests

Note: edit the property bip.ranker.tag of the pom.xml file to specify the tag of BIP-Ranker that you want to use.

Job info and logs:

export OOZIE_URL=http://iis-cdh5-test-m3:11000/oozie
oozie job -info <jobId>
oozie job -log <jobId>

where jobId is the id of the job returned by the run_workflow.sh script.