2022-11-16 14:18:40 +01:00
---
sidebar_position: 4
---
2022-11-17 14:21:38 +01:00
# Extraction of cited concepts
2022-11-16 14:18:40 +01:00
2022-11-17 14:21:38 +01:00
***Short description:***
Scans the plaintexts of publications for cited concepts, currently for references to datasets and software URIs.
2022-11-16 14:18:40 +01:00
2022-11-17 14:21:38 +01:00
***Algorithmic details:***
The algorithm extracts citations to specific datasets and software. It extracts the citation section of a publication's fulltext and applies string matching against a target database which includes an inverted index with dataset/software titles, urls and other metadata.
2022-11-16 14:18:40 +01:00
2022-11-17 14:21:38 +01:00
***Parameters:***
Title, URL, creator names, publisher names and publication year for each concept to create the target database. Identifier and publication's fulltext to extract the cited concepts
2022-11-16 14:18:40 +01:00
2022-11-17 14:21:38 +01:00
***Limitations:*** -
2022-11-16 14:18:40 +01:00
2022-11-17 14:21:38 +01:00
***Environment:***
Python, [madIS ](https://github.com/madgik/madis ), [APSW ](https://github.com/rogerbinns/apsw )
2022-11-16 14:18:40 +01:00
2022-11-17 14:21:38 +01:00
***References:***
* Foufoulas Y., Stamatogiannakis L., Dimitropoulos H., Ioannidis Y. (2017) “High-Pass Text Filtering for Citation Matching”. In: Kamps J., Tsakonas G., Manolopoulos Y., Iliadis L., Karydis I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science, vol 10450. Springer, Cham. [doi:10.1007/978-3-319-67008-9_28 ](https://doi.org/10.1007/978-3-319-67008-9_28 )
2022-11-16 14:18:40 +01:00
2022-11-17 14:21:38 +01:00
***Authority:*** ATHENA RC • ** *License:*** CC-BY/CC-0 • ** *Code:*** [iis/referenceextraction ](https://github.com/openaire/iis/tree/master/iis-wf/iis-wf-referenceextraction/src/main/resources/eu/dnetlib/iis/wf/referenceextraction )