From 849901f231bbb281efb43f5713ccbb332c93c891 Mon Sep 17 00:00:00 2001 From: Andreas Czerniak Date: Thu, 10 Nov 2022 12:15:55 +0100 Subject: [PATCH 1/5] add redmine page --- docs/data-provision/aggregation/aggregation.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/data-provision/aggregation/aggregation.md b/docs/data-provision/aggregation/aggregation.md index 22c2e52..8498957 100644 --- a/docs/data-provision/aggregation/aggregation.md +++ b/docs/data-provision/aggregation/aggregation.md @@ -49,6 +49,8 @@ The OpenAIRE aggregator collects metadata records in the majority of cases via [ For additional details about the aggregation workflows, please refer to [2]. +The whole list of available and used collectors could be found in the [RedMine Wiki - API Protocols](https://support.openaire.eu/projects/openaire/wiki/API_protocols) + ## References [1] Manghi P. et al. (2014) "The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures", Program, Vol. 48 Issue: 4, pp.322-354, [10.1108/PROG-08-2013-0045](https://doi.org/10.1108/PROG-08-2013-0045) From ce17228075f866c1e43d22ec1e1247d3cf975d7a Mon Sep 17 00:00:00 2001 From: Andreas Czerniak Date: Thu, 10 Nov 2022 12:26:43 +0100 Subject: [PATCH 2/5] contributing APIs wiki page, CAP, DRIS --- docs/data-provision/aggregation/aggregation.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/docs/data-provision/aggregation/aggregation.md b/docs/data-provision/aggregation/aggregation.md index 8498957..69a241c 100644 --- a/docs/data-provision/aggregation/aggregation.md +++ b/docs/data-provision/aggregation/aggregation.md @@ -8,7 +8,14 @@ OpenAIRE materializes an open, participatory research graph (the OpenAIRE Resear ## What does OpenAIRE collect? -OpenAIRE aggregates metadata records describing objects of the research life-cycle from content providers compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) and from entity registries (i.e. data sources offering authoritative lists of entities, like [OpenDOAR](https://v2.sherpa.ac.uk/opendoar/), [re3data](https://www.re3data.org/), [DOAJ](https://doaj.org/), and various funder databases). After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Research Graph, accessible from the [OpenAIRE EXPLORE portal](https://explore.openaire.eu) and the [APIs](https://graph.openaire.eu/develop/). +OpenAIRE aggregates metadata records describing objects of the research life-cycle from content providers +compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) base on the [OpenAIRE Content Acquisition Policies](https://doi.org/10.5281/zenodo.1446408) +from 2018. And from entity registries (i.e. data sources offering authoritative lists of entities, +like [OpenDOAR](https://v2.sherpa.ac.uk/opendoar/), [re3data](https://www.re3data.org/), +[DOAJ](https://doaj.org/), [DRIS](https://dspacecris.eurocris.org/cris/explore/dris) from [euroCRIS](https://www.openaire.eu/openaire-and-eurocris-sign-a-memorandum-of-understanding), and +various funder databases). + +After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Research Graph, accessible from the [OpenAIRE EXPLORE portal](https://explore.openaire.eu) and the [APIs](https://graph.openaire.eu/develop/). The transformation process includes the application of cleaning functions whose goal is to ensure that values are harmonised according to a common format (e.g. dates as YYYY-MM-dd) and, whenever applicable, to a common controlled vocabulary. The controlled vocabularies used for cleansing are accessible at [api.openaire.eu/vocabularies](https://api.openaire.eu/vocabularies/). Each vocabulary features a set of controlled terms, each with one code, one label, and a set of synonyms. If a synonym is found as field value, the value is updated with the corresponding term. Also, the OpenAIRE Research Graph is extended with other relevant scholarly communication sources that do not follow the OpenAIRE Guidelines and/or are too large to be integrated via the “normal” aggregation mechanism: DOIBoost (which merges Crossref, ORCID, Microsoft Academic Graph, and Unpaywall). @@ -32,7 +39,7 @@ Relationships between objects are collected from the data sources, but also auto Objects and relationships in the OpenAIRE Research Graph are extracted from information packages, i.e. metadata records, collected from data sources of the following kinds: -- *Institutional or thematic repositories*: Information systems where scientists upload the bibliographic metadata and full-texts of their articles, due to obligations from their organization or due to community practices (e.g. ArXiv, Europe PMC); +- *Literature, Institutional and thematic repositories*: Information systems where scientists upload the bibliographic metadata and full-texts of their articles, due to obligations from their organization or due to community practices (e.g. ArXiv, Europe PMC); - *Open Access Publishers and journals*: Information system of open access publishers or relative journals, which offer bibliographic metadata and PDFs of their published articles; - *Data archives*: Information systems where scientists deposit descriptive metadata and files about their research data (also known as scientific data, datasets, etc.).; - *Hybrid repositories/archives*: information systems where scientists deposit metadata and file of any kind of scientific products, incuding scientific literature, research data and research software (e.g. Zenodo) @@ -46,10 +53,10 @@ Objects and relationships in the OpenAIRE Research Graph are extracted from info OpenAIRE collects metadata records describing objects of the research life-cycle from content providers compliant to the OpenAIRE guidelines and from entity registries (i.e. data sources offering authoritative lists of entities, like OpenDOAR, re3data, DOAJ, and funder databases). The OpenAIRE aggregator collects metadata records in the majority of cases via [OAI-PMH](https://www.openarchives.org/pmh/), but also supports other standard exchange protocols like FTP(S), SFTP, and some RESTful API. +The whole list of available and used collectors could be found in the [RedMine Wiki - API Protocols](https://support.openaire.eu/projects/openaire/wiki/API_protocols) For additional details about the aggregation workflows, please refer to [2]. -The whole list of available and used collectors could be found in the [RedMine Wiki - API Protocols](https://support.openaire.eu/projects/openaire/wiki/API_protocols) ## References From f4f84a5a310800f13dc80e12e477d164ff9bc9c9 Mon Sep 17 00:00:00 2001 From: Serafeim Chatzopoulos Date: Tue, 29 Nov 2022 14:16:22 +0200 Subject: [PATCH 3/5] Fix typos --- docs/data-provision/aggregation/aggregation.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/data-provision/aggregation/aggregation.md b/docs/data-provision/aggregation/aggregation.md index 69a241c..69ab3d0 100644 --- a/docs/data-provision/aggregation/aggregation.md +++ b/docs/data-provision/aggregation/aggregation.md @@ -9,13 +9,13 @@ OpenAIRE materializes an open, participatory research graph (the OpenAIRE Resear ## What does OpenAIRE collect? OpenAIRE aggregates metadata records describing objects of the research life-cycle from content providers -compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) base on the [OpenAIRE Content Acquisition Policies](https://doi.org/10.5281/zenodo.1446408) -from 2018. And from entity registries (i.e. data sources offering authoritative lists of entities, +compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) based on the [OpenAIRE Content Acquisition Policies](https://doi.org/10.5281/zenodo.1446408) +from 2018 onward, and from entity registries (i.e. data sources offering authoritative lists of entities, like [OpenDOAR](https://v2.sherpa.ac.uk/opendoar/), [re3data](https://www.re3data.org/), [DOAJ](https://doaj.org/), [DRIS](https://dspacecris.eurocris.org/cris/explore/dris) from [euroCRIS](https://www.openaire.eu/openaire-and-eurocris-sign-a-memorandum-of-understanding), and various funder databases). -After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Research Graph, accessible from the [OpenAIRE EXPLORE portal](https://explore.openaire.eu) and the [APIs](https://graph.openaire.eu/develop/). +After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final version of OpenAIRE Research Graph. The transformation process includes the application of cleaning functions whose goal is to ensure that values are harmonised according to a common format (e.g. dates as YYYY-MM-dd) and, whenever applicable, to a common controlled vocabulary. The controlled vocabularies used for cleansing are accessible at [api.openaire.eu/vocabularies](https://api.openaire.eu/vocabularies/). Each vocabulary features a set of controlled terms, each with one code, one label, and a set of synonyms. If a synonym is found as field value, the value is updated with the corresponding term. Also, the OpenAIRE Research Graph is extended with other relevant scholarly communication sources that do not follow the OpenAIRE Guidelines and/or are too large to be integrated via the “normal” aggregation mechanism: DOIBoost (which merges Crossref, ORCID, Microsoft Academic Graph, and Unpaywall). From a844ac459cd4715cfb90e4b271b7c750e5a50eab Mon Sep 17 00:00:00 2001 From: Serafeim Chatzopoulos Date: Tue, 29 Nov 2022 14:21:52 +0200 Subject: [PATCH 4/5] Align references in aggregation section with those in relevant pubs --- docs/data-provision/aggregation/aggregation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/data-provision/aggregation/aggregation.md b/docs/data-provision/aggregation/aggregation.md index 69ab3d0..7e50f3f 100644 --- a/docs/data-provision/aggregation/aggregation.md +++ b/docs/data-provision/aggregation/aggregation.md @@ -60,6 +60,6 @@ For additional details about the aggregation workflows, please refer to [2]. ## References -[1] Manghi P. et al. (2014) "The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures", Program, Vol. 48 Issue: 4, pp.322-354, [10.1108/PROG-08-2013-0045](https://doi.org/10.1108/PROG-08-2013-0045) +[1] Manghi, P., Artini, M., Atzori, C., Bardi, A., Mannocci, A., La Bruzzo, S., Candela, L., Castelli, D. and Pagano, P. (2014), “The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures”, Program: electronic library and information systems, Vol. 48 No. 4, pp. 322-354. [doi:10.1108/prog-08-2013-0045](http://doi.org/10.1108/prog-08-2013-0045) -[2] Atzori, Claudio, Bardi, Alessia, Manghi, Paolo, & Mannocci, Andrea. (2017). The OpenAIRE workflows for data management. Zenodo. [10.5281/zenodo.996006](http://doi.org/10.5281/zenodo.996006) +[2] Atzori, C., Bardi, A., Manghi, P., & Mannocci, A. (2017, January). "The OpenAIRE workflows for data management". In Italian Research Conference on Digital Libraries (pp. 95-107). Springer, Cham. [doi:10.1007/978-3-319-68130-6_8](https://doi.org/10.1007/978-3-319-68130-6_8) \ No newline at end of file From ac7554cb8a598cf1982d05a0d470d2187918f256 Mon Sep 17 00:00:00 2001 From: Serafeim Chatzopoulos Date: Fri, 2 Dec 2022 13:57:16 +0200 Subject: [PATCH 5/5] Minor rephrasing --- docs/data-provision/aggregation/aggregation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data-provision/aggregation/aggregation.md b/docs/data-provision/aggregation/aggregation.md index d1711b3..037d98e 100644 --- a/docs/data-provision/aggregation/aggregation.md +++ b/docs/data-provision/aggregation/aggregation.md @@ -11,7 +11,7 @@ OpenAIRE materializes an open, participatory research graph (the OpenAIRE Graph) OpenAIRE aggregates metadata records describing objects of the research life-cycle from content providers compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) and from entity registries (i.e. data sources offering authoritative lists of entities, like [OpenDOAR](https://v2.sherpa.ac.uk/opendoar/), [re3data](https://www.re3data.org/), [DOAJ](https://doaj.org/), and various funder databases). After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Graph, accessible from the [OpenAIRE EXPLORE portal](https://explore.openaire.eu) and the [APIs](https://graph.openaire.eu/develop/). The transformation process includes the application of cleaning functions whose goal is to ensure that values are harmonised according to a common format (e.g. dates as YYYY-MM-dd) and, whenever applicable, to a common controlled vocabulary. The controlled vocabularies used for cleansing are accessible at [api.openaire.eu/vocabularies](https://api.openaire.eu/vocabularies/). Each vocabulary features a set of controlled terms, each with one code, one label, and a set of synonyms. If a synonym is found as field value, the value is updated with the corresponding term. -Also, the OpenAIRE Graph is extended with other relevant scholarly communication sources that do not follow the OpenAIRE Guidelines and/or are too large to be integrated via the “normal” aggregation mechanism: DOIBoost (which merges Crossref, ORCID, Microsoft Academic Graph, and Unpaywall). +In addition, the OpenAIRE Graph is extended with other relevant scholarly communication sources that need special handling, either because they do not strictly follow the OpenAIRE Guidelines or due to the vast amount of data of data they offer (e.g. DOIBoost, that merges Crossref, ORCID, Microsoft Academic Graph, and Unpaywall).

Aggregation