Graph construction for IIS [PROD NEW]
IIS
30
set blacklist of funder nsPrefixes
nsPrefixBlacklist
conicytf____,dfgf________,gsrt________,innoviris___,miur________,rif_________,rsf_________,sgov________,sfrs________
set the path of the map defining the relations id mappings
idMappingPath
/data/maps/fct_map.json
Set the path containing the PROD AGGREGATOR graph
aggregatorGraphPath
/tmp/prod_inference/graph/00_graph_aggregator
Set the target path to store the RAW graph
rawGraphPath
/tmp/prod_inference/graph/01_graph_raw
Set the target path to store the CLEANED graph
cleanedFirstGraphPath
/tmp/prod_inference/graph/02_graph_clean_first
Set the target path to store the DEDUPED graph
dedupGraphPath
/tmp/prod_inference/graph/03_graph_dedup
Set the target path to store the CONSISTENCY graph
consistentGraphPath
/tmp/prod_inference/graph/04_graph_consistent
Set the target path to store the CLEANED graph
cleanedGraphPath
/tmp/prod_inference/graph/05_graph_cleaned
Set the dedup orchestrator name
dedupConfig
dedup-similarity-result-decisiontree-v2
declares the ActionSet ids to promote in the RAW graph
actionSetIdsRawGraph
scholexplorer-dump,doiboost,orcidworks-no-doi,datacite
Set the IS lookup service address
isLookUpUrl
http://services.openaire.eu:8280/is/services/isLookUp?wsdl
wait configurations
reuse cached ODF claims from the PROD aggregation system
reuseODFClaims
true
reuse cached OAF claims from the PROD aggregation system
reuseOAFClaims
true
reuse cached ODF records on HDFS from the PROD aggregation system
reuseODFhdfs
true
reuse cached OAF records on HDFS from the PROD aggregation system
reuseOAFhdfs
true
reuse cached ODF content from the PROD aggregation system
reuseODF
true
reuse cached OAF content from the PROD aggregation system
reuseOAF
true
reuse cached DB content from the PROD aggregation system
reuseDB
true
reuse cached OpenOrgs content from the PROD aggregation system
reuseDBOpenorgs
true
should apply the relations id patching based on the provided idMapping?
shouldPatchRelations
false
set the PROD aggregator content path
contentPath
/tmp/prod_aggregator
wait configurations
create the PROD AGGREGATOR graph
executeOozieJob
IIS
{
'graphOutputPath' : 'aggregatorGraphPath',
'isLookupUrl' : 'isLookUpUrl',
'reuseODFClaims' : 'reuseODFClaims',
'reuseOAFClaims' : 'reuseOAFClaims',
'reuseDB' : 'reuseDB',
'reuseDBOpenorgs' : 'reuseDBOpenorgs',
'reuseODF' : 'reuseODF',
'reuseODF_hdfs' : 'reuseODFhdfs',
'reuseOAF' : 'reuseOAF',
'reuseOAF_hdfs' : 'reuseOAFhdfs',
'contentPath' : 'contentPath',
'nsPrefixBlacklist' : 'nsPrefixBlacklist',
'shouldPatchRelations' : 'shouldPatchRelations',
'idMappingPath' : 'idMappingPath'
}
{
'oozie.wf.application.path' : '/lib/dnet/PROD/oa/graph/raw_all/oozie_app',
'mongoURL' : '',
'mongoDb' : '',
'mdstoreManagerUrl' : '',
'postgresURL' : '',
'postgresUser' : '',
'postgresPassword' : '',
'postgresOpenOrgsURL' : '',
'postgresOpenOrgsUser' : '',
'postgresOpenOrgsPassword' : '',
'shouldHashId' : 'true',
'importOpenorgs' : 'true',
'workingDir' : '/tmp/prod_inference/working_dir/prod_aggregator'
}
build-report
create the RAW graph
executeOozieJob
IIS
{
'inputActionSetIds' : 'actionSetIdsRawGraph',
'inputGraphRootPath' : 'aggregatorGraphPath',
'outputGraphRootPath' : 'rawGraphPath',
'isLookupUrl' : 'isLookUpUrl'
}
{
'oozie.wf.application.path' : '/lib/dnet/PROD/actionmanager/wf/main/oozie_app',
'sparkExecutorCores' : '3',
'sparkExecutorMemory' : '10G',
'activePromoteDatasetActionPayload' : 'true',
'activePromoteDatasourceActionPayload' : 'true',
'activePromoteOrganizationActionPayload' : 'true',
'activePromoteOtherResearchProductActionPayload' : 'true',
'activePromoteProjectActionPayload' : 'true',
'activePromotePublicationActionPayload' : 'true',
'activePromoteRelationActionPayload' : 'true',
'activePromoteResultActionPayload' : 'true',
'activePromoteSoftwareActionPayload' : 'true',
'mergeAndGetStrategy' : 'MERGE_FROM_AND_GET',
'workingDir' : '/tmp/prod_inference/working_dir/promoteActionsRaw'
}
build-report
clean the properties in the graph typed as Qualifier according to the vocabulary indicated in schemeid
executeOozieJob
IIS
{
'graphInputPath' : 'rawGraphPath',
'graphOutputPath': 'cleanedFirstGraphPath',
'isLookupUrl': 'isLookUpUrl'
}
{
'oozie.wf.application.path' : '/lib/dnet/PROD/oa/graph/clean/oozie_app',
'workingDir' : '/tmp/prod_inference/working_dir/clean_first'
}
build-report
search for duplicates in the raw graph
executeOozieJob
IIS
{
'actionSetId' : 'dedupConfig',
'graphBasePath' : 'cleanedFirstGraphPath',
'dedupGraphPath': 'dedupGraphPath',
'isLookUpUrl' : 'isLookUpUrl'
}
{
'oozie.wf.application.path' : '/lib/dnet/PROD/oa/dedup/scan/oozie_app',
'actionSetIdOpenorgs' : 'dedup-similarity-organization-simple',
'workingPath' : '/tmp/prod_inference/working_dir/dedup',
'sparkExecutorCores' : '3',
'sparkExecutorMemory' : '10G'
}
build-report
mark duplicates as deleted and redistribute the relationships
executeOozieJob
IIS
{
'graphBasePath' : 'dedupGraphPath',
'graphOutputPath': 'consistentGraphPath'
}
{
'oozie.wf.application.path' : '/lib/dnet/PROD/oa/dedup/consistency/oozie_app',
'workingPath' : '/tmp/prod_inference/working_dir/dedup'
}
build-report
clean the properties in the graph typed as Qualifier according to the vocabulary indicated in schemeid
executeOozieJob
IIS
{
'graphInputPath' : 'consistentGraphPath',
'graphOutputPath': 'cleanedGraphPath',
'isLookupUrl': 'isLookUpUrl'
}
{
'oozie.wf.application.path' : '/lib/dnet/PROD/oa/graph/clean/oozie_app',
'workingDir' : '/tmp/prod_inference/working_dir/clean'
}
build-report
wf_20210719_165159_86
2021-07-19T20:45:09+00:00
SUCCESS