diff --git a/docs/data-provision/enrichment/enrichment.md b/docs/data-provision/enrichment/enrichment.md index f6938c2..0604141 100644 --- a/docs/data-provision/enrichment/enrichment.md +++ b/docs/data-provision/enrichment/enrichment.md @@ -18,27 +18,76 @@ The OpenAIRE Research Graph is enriched by links mined by OpenAIRE’s full-text The Deduction process (also known as “bulk tagging”) enriches each record with new information that can be derived from the existing property values. -As of September 2020, three procedures are in place to relate a research product to a research initiative, infrastructure (RI) or community (RC) based on: +This process is used to associate results to community/research initiatives that are part of OpenAIRE. +As of November 2022, three procedures are in place to relate a research product to a research initiative, infrastructure (RI) or community (RC) based on: -* subjects (2.7M results tagged) +* subjects: it is possible to specify a list of subjects that are relevant for the RC/RI. Every time one of the subjects is found among the subjects of a result, the result is linked to the RC/RI. -* Zenodo community (16K results tagged) +
+ +
+ + +* data sources: it is possible to list a set of data sources relevant for the RC/RI. All the results collected from these data sources will be linked to the RC/RI ++ +
+ + When only some results collected from a datasource are relevant for the RC/RI, it is possible to specify a set of selection constraints (SC) that have to be verified before linking the result to the +community. The selection constraint has the form SC = S1 or S2 or ... or Sn. The generic Si has the form Si = si1 and si2 and ...and sin and each sij is a condition on a specific field of the result. The set of fields that can be specified is F={title, author, contributor, description, orcid}, +while the set of condition can be among V={contains, equals, not_contains, not_equals, contains_ignorecase, equals_ignorecase, not_contains_ignorecase, not_equal_ignorecase}, and the value is free text. +A possible selection criteria can be: “All the products whose contributor contains DARIAH “ + ++ +
+ +* Zenodo community: it is possible to list a set of Zenodo communities relevant for the RC/RI. All the products collected from the listed Zenodo communities are linked to the RC/RI + + ++ +
-* the data source it comes from (250K results tagged) The list of subjects, Zenodo communities and data sources used to enrich the products are defined by the managers of the community gateway or infrastructure monitoring dashboard associated with the RC/RI. ## Propagation -This process “propagates” properties and links from one product to another if between the two there is a “strong” semantic relationship. +This process enriches the graph by adding new links and/or new properties. The new information is added by exploiting existing semantic +relationships and values between the involved entities -As of September 2020, the following procedures are in place: -Propagation of the property “country” to results from institutional repositories: e.g. publication collected from an institutional repository maintained by an italian university will be enriched with the property “country = IT”. +As of November 2022, the following procedures are in place: -* Propagation of links to projects: e.g. publication linked to project P “is supplemented by” a dataset D. Dataset D will get the link to project P. The relationships considered for this procedure are “isSupplementedBy” and “supplements”. +* Country propagation: updates of the property “country” of a results. This happen when the result is collected from an institutional datasource or when the datasource hosting the result in inserted in a whitelist. For all the results whose hosting datasource verifies one of the conditions above, the country of the organization providing the datasource is added to the country of the result: e.g. publication collected from an institutional repository maintained by an italian university will be enriched with the property “country = IT”. ++ +
-* Propagation of related community/infrastructure/initiative from organizations to products via affiliation relationships: e.g. a publication with an author affiliated with organization O. The manager of the community gateway C declared that the outputs of O are all relevant for his/her community C. The publication is tagged as relevant for C. +* Project propagation: adds a "isProducedBy" relationship (and its inverse) between a Project P and Result R, if R has a strong semantic relationship with another Result R1 and R1 is linked to P: e.g. publication linked to project P “is supplemented by” a dataset D. Dataset D will get the link to project P. The relationships considered for this procedure are “isSupplementedBy” and “isSupplementTo”. ++ +
+* Result to RC/RI through organization propagation. The manager of the RC/RI can specify a set of organizations whose product are relevant for the +community. This kind of propagation exploits the hasAuthorInstitution relation between results and organizations, +Each result having such a relation with at least one organization relevant for the RC/RI will be linked to it. ++ +
-* Propagation of related community/infrastructure/initiative to related products: e.g. publication associated to community C is supplemented by a dataset D. Dataset D will get the association to C. The relationships considered for this procedure are “isSupplementedBy” and “supplements”. - -* Propagation of ORCID identifiers to related products, if the products have the same authors: e.g. publication has ORCID for its authors and is supplemented by a dataset D. Dataset D has the same authors as the publication. Authors of D are enriched with the ORCIDs available in the publication. The relationships considered for this procedure are “isSupplementedBy” and “supplements”. \ No newline at end of file +* Result to RC/RI through semantic relation: e.g. publication associated to community C is supplemented by a dataset D. Dataset D will get the association to C. The relationships considered for this procedure are “isSupplementedBy” and “supplements”. ++ +
+* ORCID identifiers to result through semantic relation related products, if the products have the same authors: e.g. publication has ORCID for its authors and is supplemented by a dataset D. Dataset D has the same authors as the publication. Authors of D are enriched with the ORCIDs available in the publication. The relationships considered for this procedure are “isSupplementedBy” and “supplements”. ++ +
+* affiliation to organization through institutional repository ++ +
+* affiliation to organization through semantic relation ++ +
diff --git a/docs/data-provision/enrichment/img.png b/docs/data-provision/enrichment/img.png new file mode 100644 index 0000000..d77d197 Binary files /dev/null and b/docs/data-provision/enrichment/img.png differ diff --git a/static/img/docs/enrichment/bulktagging_datasource.png b/static/img/docs/enrichment/bulktagging_datasource.png new file mode 100644 index 0000000..4a54800 Binary files /dev/null and b/static/img/docs/enrichment/bulktagging_datasource.png differ diff --git a/static/img/docs/enrichment/bulktagging_selconstraints.png b/static/img/docs/enrichment/bulktagging_selconstraints.png new file mode 100644 index 0000000..7d43157 Binary files /dev/null and b/static/img/docs/enrichment/bulktagging_selconstraints.png differ diff --git a/static/img/docs/enrichment/bulktagging_subject.png b/static/img/docs/enrichment/bulktagging_subject.png new file mode 100644 index 0000000..3ee4784 Binary files /dev/null and b/static/img/docs/enrichment/bulktagging_subject.png differ diff --git a/static/img/docs/enrichment/bulktagging_zenodo.png b/static/img/docs/enrichment/bulktagging_zenodo.png new file mode 100644 index 0000000..64aee75 Binary files /dev/null and b/static/img/docs/enrichment/bulktagging_zenodo.png differ diff --git a/static/img/docs/enrichment/propagation_affiliationistrepo.png b/static/img/docs/enrichment/propagation_affiliationistrepo.png new file mode 100644 index 0000000..63cf757 Binary files /dev/null and b/static/img/docs/enrichment/propagation_affiliationistrepo.png differ diff --git a/static/img/docs/enrichment/propagation_country.png b/static/img/docs/enrichment/propagation_country.png new file mode 100644 index 0000000..70aa96f Binary files /dev/null and b/static/img/docs/enrichment/propagation_country.png differ diff --git a/static/img/docs/enrichment/propagation_orcid.png b/static/img/docs/enrichment/propagation_orcid.png new file mode 100644 index 0000000..cabfc2d Binary files /dev/null and b/static/img/docs/enrichment/propagation_orcid.png differ diff --git a/static/img/docs/enrichment/propagation_organizationsemrel.png b/static/img/docs/enrichment/propagation_organizationsemrel.png new file mode 100644 index 0000000..d6e55a9 Binary files /dev/null and b/static/img/docs/enrichment/propagation_organizationsemrel.png differ diff --git a/static/img/docs/enrichment/propagation_resulttocommunitythroughorganization.png b/static/img/docs/enrichment/propagation_resulttocommunitythroughorganization.png new file mode 100644 index 0000000..68aa116 Binary files /dev/null and b/static/img/docs/enrichment/propagation_resulttocommunitythroughorganization.png differ diff --git a/static/img/docs/enrichment/propagation_resulttocommunitythroughsemrel.png b/static/img/docs/enrichment/propagation_resulttocommunitythroughsemrel.png new file mode 100644 index 0000000..2a8a785 Binary files /dev/null and b/static/img/docs/enrichment/propagation_resulttocommunitythroughsemrel.png differ diff --git a/static/img/docs/enrichment/propagation_resulttoproject.png b/static/img/docs/enrichment/propagation_resulttoproject.png new file mode 100644 index 0000000..750b691 Binary files /dev/null and b/static/img/docs/enrichment/propagation_resulttoproject.png differ