diff --git a/docs/data-model/entities/community.md b/docs/data-model/entities/community.md index f9fa80b..dfbf24a 100644 --- a/docs/data-model/entities/community.md +++ b/docs/data-model/entities/community.md @@ -2,8 +2,7 @@ sidebar_position: 6 --- - -# Community (Initiative) +# Community Research communities and research initiatives are intended as groups of people with a common research intent and can be of two types: research initiatives or research communities: diff --git a/docs/data-provision/aggregation.md b/docs/data-provision/aggregation.md index 4ae6ab4..6b865d6 100644 --- a/docs/data-provision/aggregation.md +++ b/docs/data-provision/aggregation.md @@ -4,11 +4,13 @@ sidebar_position: 1 # Aggregation -OpenAIRE collects metadata records from a variety of content providers as described in https://www.openaire.eu/aggregation-and-content-provision-workflows. +OpenAIRE collects metadata records from a variety of content providers as described in the [aggregation and content provision workflows](https://www.openaire.eu/aggregation-and-content-provision-workflows). OpenAIRE aggregates metadata records describing objects of the research life-cycle from content providers compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) and from entity registries (i.e. data sources offering authoritative lists of entities, like OpenDOAR, re3data, DOAJ, and funder databases). After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Research Graph that you can access from the OpenAIRE portal and the APIs. The transformation process includes the application of cleaning functions whose goal is to ensure that values are harmonised according to a common format (e.g. dates as YYYY-MM-dd) and, whenever applicable, to a common controlled vocabulary. The controlled vocabularies used for cleansing are accessible at http://api.openaire.eu/vocabularies. Each vocabulary features a set of controlled terms, each with one code, one label, and a set of synonyms. If a synonym is found as field value, the value is updated with the corresponding term. Also, the OpenAIRE Research Graph is extended with other relevant scholarly communication sources that are too big to be integrated via the “normal” aggregation mechanism: DOIBoost (which merges Crossref, ORCID, Microsoft Academic Graph, and Unpaywall), and ScholeXplorer, one of the Scholix hubs offering a large set of links between research literature and data. -![Aggregation](./assets/aggregation.png) +
+ +
\ No newline at end of file diff --git a/docs/data-provision/data-provision.md b/docs/data-provision/data-provision.md index bc43579..ea103c3 100644 --- a/docs/data-provision/data-provision.md +++ b/docs/data-provision/data-provision.md @@ -1,11 +1,7 @@ # Data provision - -source: https://graph.openaire.eu/about#tabs_card - OpenAIRE collects metadata records from more than 70K scholarly communication sources from all over the world, including Open Access institutional repositories, data archives, journals. All the metadata records (i.e. descriptions of research products) are put together in a data lake, together with records from Crossref, Unpaywall, ORCID, Grid.ac, and information about projects provided by national and international funders. Dedicated inference algorithms applied to metadata and to the full-texts of Open Access publications enrich the content of the data lake with links between research results and projects, author affiliations, subject classification, links to entries from domain-specific databases. Duplicated organisations and results are identified and merged together to obtain an open, trusted, public resource enabling explorations of the scholarly communication landscape like never before. -![Architecture](./assets/architecture.png) - -TODO: make this image linkable - ++ +
diff --git a/docs/data-provision/deduplication/clustering-functions.md b/docs/data-provision/deduplication/clustering-functions.md index 9fcbc31..2447437 100644 --- a/docs/data-provision/deduplication/clustering-functions.md +++ b/docs/data-provision/deduplication/clustering-functions.md @@ -2,7 +2,6 @@ sidebar_position: 3 --- # Clustering functions -TODO ## NgramPairs It produces a list of concatenations of a pair of ngrams generated from different words.+ +
#### Creation of representative record TODO diff --git a/docs/data-provision/enrichment/enrichment.md b/docs/data-provision/enrichment/enrichment.md index d5996a6..f6938c2 100644 --- a/docs/data-provision/enrichment/enrichment.md +++ b/docs/data-provision/enrichment/enrichment.md @@ -1,8 +1,5 @@ # Enrichment - -TODO: intro - ## Mining The OpenAIRE Research Graph is enriched by links mined by OpenAIRE’s full-text mining algorithms that scan the plaintexts of publications for funding information, references to datasets, software URIs, accession numbers of bioetities, and EPO patent mentions. Custom mining modules also link research objects to specific research communities, initiatives and infrastructures. In addition, other inference modules provide content-based document classification, document similarity, citation matching, and author affiliation matching. diff --git a/docs/intro.md b/docs/intro.md index c18df18..f05ca2b 100644 --- a/docs/intro.md +++ b/docs/intro.md @@ -21,6 +21,5 @@ As of today, the OpenAIRE Research Graph aggregates around 450Mi metadata record * Microsoft Academic Graph * Datacite -After cleaning, deduplication, enrichment and full-text mining processes, the graph is analysed to produce statistics for the [OpenAIRE MONITOR](https://monitor.openaire.eu), the [Open Science Observatory](https://osobservatory.openaire.eu), made discoverable via the [OpenAIRE EXPLORE](https://explore.openaire.eu) and programmatically accessible as described at -https://develop.openaire.eu. -Json dumps are also published on Zenodo. +After cleaning, deduplication, enrichment and full-text mining processes, the graph is analysed to produce statistics for the [OpenAIRE MONITOR](https://monitor.openaire.eu), the [Open Science Observatory](https://osobservatory.openaire.eu), made discoverable via the [OpenAIRE EXPLORE](https://explore.openaire.eu) and programmatically accessible via [OpenAIRE Public APIs](https://develop.openaire.eu). +Last but not least, frequently updated [JSON dumps](download) are published on Zenodo. diff --git a/docs/publications.md b/docs/publications.md index 27a9174..77fcbd4 100644 --- a/docs/publications.md +++ b/docs/publications.md @@ -2,5 +2,56 @@ sidebar_position: 7 --- -# Related publications -TODO +# How to cite + +If you use one of the [OpenAIRE Research Graph dumps](https://zenodo.org/record/6616871), please cite it following the recommendation that you find on the Zenodo page. + +## Other relevant publications + +### Aggregation system +Manghi, P., Artini, M., Atzori, C., Bardi, A., Mannocci, A., La Bruzzo, S., Candela, L., Castelli, D. and Pagano, P. (2014), “The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures”, Program: electronic library and information systems, Vol. 48 No. 4, pp. 322-354. +Michele Artini, Claudio Atzori, Alessia Bardi, Sandro La Bruzzo, Paolo Manghi, & Andrea Mannocci. (2016, November 24). The D-NET software toolkit: dnet-basic-aggregator (Version 1.3.0). Zenodo. + +Atzori, C., Bardi, A., Manghi, P., & Mannocci, A. (2017, January). The OpenAIRE workflows for data management. In Italian Research Conference on Digital Libraries (pp. 95-107). Springer, Cham. + +Mannocci, A., & Manghi, P. (2016, September). DataQ: a data flow quality monitoring system for aggregative data infrastructures. In International Conference on Theory and Practice of Digital Libraries (pp. 357-369). Springer, Cham. + +### Deduplication +Claudio Atzori, & Paolo Manghi. (2017, February 17). gdup: a big graph entity deduplication system (Version 4.0.5). Zenodo. https://code-repo.d4science.org/D-Net/dnet-dedup/releases + +Manghi, Paolo, Marko Mikulicic, and Claudio Atzori. "De-duplication of aggregation authority files." International Journal of Metadata, Semantics and Ontologies 7.2 (2012): 114-130. + +Manghi, P., Atzori, C., De Bonis, M., & Bardi, A. (2020). Entity deduplication in big data graphs for scholarly communication. Data Technologies and Applications. +Manghi, P., & Mikulicic, M. (2011, October). PACE: A general-purpose tool for authority control. In Research Conference on Metadata and Semantic Research (pp. 80-92). Springer, Berlin, Heidelberg. + +Atzori, C., Manghi, P., & Bardi, A. (2018, December). GDup: de-duplication of scholarly communication big graphs. In 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT) (pp. 142-151). IEEE. +Atzori, Claudio. "GDup: an Integrated, Scalable Big Graph Deduplication System." (2016). + +### Mining + +M. Kobos, Ł. Bolikowski, M. Horst, P. Manghi, N. Manola, J. Schirrwagen, “Information inference in scholarly communication infrastructures: the OpenAIREplus project experience”, Procedia Computer Science 38, 92-99. + +Tkaczyk, D., Szostek, P., Fedoryszak, M. et al. CERMINE: automatic extraction of structured metadata from scientific literature. IJDAR 18, 317–335 (2015). +Giannakopoulos T., Foufoulas Y., Dimitropoulos H., Manola N. (2019) “Interactive Text Analysis and Information Extraction”. In: Manghi P., Candela L., Silvello G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, Cham. + +Foufoulas Y., Stamatogiannakis L., Dimitropoulos H., Ioannidis Y. (2017) “High-Pass Text Filtering for Citation Matching”. In: Kamps J., Tsakonas G., Manolopoulos Y., Iliadis L., Karydis I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science, vol 10450. Springer, Cham. + +T. Giannakopoulos, I. Foufoulas, E. Stamatogiannakis, H. Dimitropoulos, N. Manola, and Y. Ioannidis. 2015. “Visual-Based Classification of Figures from Scientific Literature”. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). Association for Computing Machinery, New York, NY, USA, 1059–1060. + +Giannakopoulos, T., Foufoulas, I., Stamatogiannakis, E., Dimitropoulos, H., Manola, N., & Ioannidis, Y. (2014). “Discovering and Visualizing Interdisciplinary Content Classes in Scientific Publications”. D-Lib Mag., Volume 20, Number 11/12. + +Giannakopoulos T., Stamatogiannakis E., Foufoulas I., Dimitropoulos H., Manola N., Ioannidis Y. (2014) “Content Visualization of Scientific Corpora Using an Extensible Relational Database Implementation”. In: Bolikowski Ł., Casarosa V., Goodale P., Houssos N., Manghi P., Schirrwagen J. (eds) Theory and Practice of Digital Libraries -- TPDL 2013 Selected Workshops. TPDL 2013. Communications in Computer and Information Science, vol 416. Springer, Cham. Also in: Google Books + +Giannakopoulos T., Dimitropoulos H., Metaxas O., Manola N., Ioannidis Y. (2013) “Supervised Content Visualization of Scientific Publications: A Case Study on the ArXiv Dataset”. In: Kłopotek M.A., Koronacki J., Marciniak M., Mykowiecka A., Wierzchoń S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. + +Y. Chronis, Y. Foufoulas, V. Nikolopoulos, A. Papadopoulos, L. Stamatogiannakis, C. Svingos, Y. E. Ioannidis, "A Relational Approach to Complex Dataflows", in Workshop Proceedings of the EDBT/ICDT 2016 (MEDAL 2016) Joint Conference (March 15, 2016, Bordeaux, France) on CEUR-WS.org (ISSN 1613-0073) + +### Portals +Baglioni M. et al. (2019) The OpenAIRE Research Community Dashboard: On Blending Scientific Workflows and Scientific Publishing. In: Doucet A., Isaac A., Golub K., Aalberg T., Jatowt A. (eds) Digital Libraries for Open Knowledge. TPDL 2019. Lecture Notes in Computer Science, vol 11799. Springer, Cham. + +### Broker Service +Artini, M., Atzori, C., Bardi, A., La Bruzzo, S., Manghi, P., & Mannocci, A. (2015). The OpenAIRE literature broker service for institutional repositories. D-Lib Magazine, 21(11/12), 1. + +Manghi, P., Atzori, C., Bardi, A., La Bruzzo, S., & Artini, M. (2016, February). Realizing a Scalable and History-Aware Literature Broker Service for OpenAIRE. In Italian Research Conference on Digital Libraries (pp. 92-103). Springer, Cham. + + diff --git a/sidebars.js b/sidebars.js index fd342f2..8f4a89d 100644 --- a/sidebars.js +++ b/sidebars.js @@ -13,19 +13,102 @@ /** @type {import('@docusaurus/plugin-content-docs').SidebarsConfig} */ const sidebars = { - // By default, Docusaurus generates a sidebar from the docs folder structure - tutorialSidebar: [{type: 'autogenerated', dirName: '.'}], - - // But you can create a sidebar manually - /* - tutorialSidebar: [ + mySidebar: [ { - type: 'category', - label: 'Tutorial', - items: ['hello'], + type: 'doc', + id: 'intro' }, - ], - */ + { + type: 'category', + label: "Data model", + link: {type: 'doc', id: 'data-model/data-model'}, + items: [ + { + type: 'category', + label: "Entities", + link: { + type: 'generated-index', + description: 'The main entities of the OpenAIRE Research Graph are listed below.' + }, + items: [ + { type: 'doc', id: 'data-model/entities/result' }, + { type: 'doc', id: 'data-model/entities/data-source' }, + { type: 'doc', id: 'data-model/entities/organization' }, + { type: 'doc', id: 'data-model/entities/project' }, + { type: 'doc', id: 'data-model/entities/community' }, + ] + }, + { + type: 'doc', + id: 'data-model/relationships' + } + ] + }, + { + type: "link", + label: "Public API", + href: "https://graph.openaire.eu/develop/overview.html" + }, + { + type: 'doc', + id: 'download' + }, + { + type: 'category', + label: "Data provision", + link: {type: 'doc', id: 'data-provision/data-provision'}, + items: [ + { type: 'doc', id: 'data-provision/aggregation' }, + { + type: 'category', + label: "Deduplication", + link: {type: 'doc', id: 'data-provision/deduplication/deduplication'}, + items: [ + { type: 'doc', id: 'data-provision/deduplication/research-products' }, + { type: 'doc', id: 'data-provision/deduplication/organizations' }, + ] + }, + { + type: 'category', + label: "Enrichment", + link: {type: 'doc', id: 'data-provision/enrichment/enrichment'}, + items: [ + { type: 'doc', id: 'data-provision/enrichment/mining' }, + { type: 'doc', id: 'data-provision/enrichment/impact-scores' }, + ] + }, + { type: 'doc', id: 'data-provision/post-cleaning' }, + { type: 'doc', id: 'data-provision/indexing' }, + { type: 'doc', id: 'data-provision/stats' }, + ] + }, + { + type: 'doc', + id: 'services' + }, + { + type: 'category', + label: "Learning center", + link: { type: 'generated-index' }, + items: [ + { type: 'doc', id: 'learning-center/open-plato' }, + { type: 'doc', id: 'learning-center/tutorials' }, + ] + }, + { + type: 'doc', + id: 'publications', + label: "Relevant publications" + }, + { + type: 'doc', + id: 'faq' + }, + { + type: 'doc', + id: 'license' + }, + ] }; module.exports = sidebars; diff --git a/docs/data-provision/assets/aggregation.png b/static/img/docs/aggregation.png similarity index 100% rename from docs/data-provision/assets/aggregation.png rename to static/img/docs/aggregation.png diff --git a/docs/data-provision/assets/architecture.png b/static/img/docs/architecture.png similarity index 100% rename from docs/data-provision/assets/architecture.png rename to static/img/docs/architecture.png diff --git a/docs/data-provision/assets/dedup-results.png b/static/img/docs/dedup-results.png similarity index 100% rename from docs/data-provision/assets/dedup-results.png rename to static/img/docs/dedup-results.png