From 47e112420e5f64c092e4d8e72d10152774defc55 Mon Sep 17 00:00:00 2001 From: johnfouf Date: Tue, 15 Nov 2022 15:38:34 +0200 Subject: [PATCH 01/33] edit md file --- docs/data-provision/enrichment/mining.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/data-provision/enrichment/mining.md b/docs/data-provision/enrichment/mining.md index 9c1f8f5..263e165 100644 --- a/docs/data-provision/enrichment/mining.md +++ b/docs/data-provision/enrichment/mining.md @@ -3,4 +3,5 @@ sidebar_position: 1 --- # Mining algorithms +TEST TODO From e6b02ffc32d18278a2e90c50fc358c0c1c75052b Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Tue, 15 Nov 2022 14:50:45 +0100 Subject: [PATCH 02/33] Update 'docs/data-provision/enrichment/mining.md' --- docs/data-provision/enrichment/mining.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data-provision/enrichment/mining.md b/docs/data-provision/enrichment/mining.md index 263e165..976b059 100644 --- a/docs/data-provision/enrichment/mining.md +++ b/docs/data-provision/enrichment/mining.md @@ -3,5 +3,5 @@ sidebar_position: 1 --- # Mining algorithms -TEST +TEST HARRY TODO From c5b84be1d37c5714cbd57eb730a6bba842fe469e Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Tue, 15 Nov 2022 15:49:57 +0100 Subject: [PATCH 03/33] Update 'docs/data-provision/enrichment/mining.md' --- docs/data-provision/enrichment/mining.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/docs/data-provision/enrichment/mining.md b/docs/data-provision/enrichment/mining.md index 976b059..bc1153e 100644 --- a/docs/data-provision/enrichment/mining.md +++ b/docs/data-provision/enrichment/mining.md @@ -5,3 +5,20 @@ sidebar_position: 1 # Mining algorithms TEST HARRY TODO + +| Short description | briefly describes the algorithm | +| ------------- | ------------- | +| Authority | describes the organisation and/ or the person responsible for the algorithm | +| Licence | describes the licensing and rights held on the algorithm | +| Algorithmic details | describes the algorithmic solution in more detail (i.e., the various concepts used in the algorithm, its iterations, etc.) | +| Parameters | describes the required algorithm parameters | +| Limitations | Mentions any limitation of the output | +| Code repository | the code repository of the algorithm | +| Environment | Programming Languages and software packages used | +| References & resources | Cites any related research and possible additional resource (such as datasets etc) | + + + + + + From b739759e3a03c13f0f954de4f31394331e43f098 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Tue, 15 Nov 2022 15:53:21 +0100 Subject: [PATCH 04/33] Update 'docs/data-provision/enrichment/mining.md' --- docs/data-provision/enrichment/mining.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data-provision/enrichment/mining.md b/docs/data-provision/enrichment/mining.md index bc1153e..bbf385b 100644 --- a/docs/data-provision/enrichment/mining.md +++ b/docs/data-provision/enrichment/mining.md @@ -7,7 +7,7 @@ TEST HARRY TODO | Short description | briefly describes the algorithm | -| ------------- | ------------- | +| --- | --- | | Authority | describes the organisation and/ or the person responsible for the algorithm | | Licence | describes the licensing and rights held on the algorithm | | Algorithmic details | describes the algorithmic solution in more detail (i.e., the various concepts used in the algorithm, its iterations, etc.) | From 1cf79bc30caaefe5c9c919ca44033a00227fab94 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 14:08:56 +0100 Subject: [PATCH 05/33] Add 'docs/data-provision/enrichment/acks.md' --- docs/data-provision/enrichment/acks.md | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 docs/data-provision/enrichment/acks.md diff --git a/docs/data-provision/enrichment/acks.md b/docs/data-provision/enrichment/acks.md new file mode 100644 index 0000000..9c1f8f5 --- /dev/null +++ b/docs/data-provision/enrichment/acks.md @@ -0,0 +1,6 @@ +--- +sidebar_position: 1 +--- + +# Mining algorithms +TODO From ad4c4f909e0c8d9670c3c36b9c0058488976d022 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 14:16:46 +0100 Subject: [PATCH 06/33] Update 'docs/data-provision/enrichment/acks.md' --- docs/data-provision/enrichment/acks.md | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/docs/data-provision/enrichment/acks.md b/docs/data-provision/enrichment/acks.md index 9c1f8f5..990e05a 100644 --- a/docs/data-provision/enrichment/acks.md +++ b/docs/data-provision/enrichment/acks.md @@ -1,6 +1,24 @@ --- -sidebar_position: 1 +sidebar_position: 3 --- # Mining algorithms +TEST HARRY TODO + +| Short description | briefly describes the algorithm | +| --- | --- | +| Authority | describes the organisation and/ or the person responsible for the algorithm | +| Licence | describes the licensing and rights held on the algorithm | +| Algorithmic details | describes the algorithmic solution in more detail (i.e., the various concepts used in the algorithm, its iterations, etc.) | +| Parameters | describes the required algorithm parameters | +| Limitations | Mentions any limitation of the output | +| Code repository | the code repository of the algorithm | +| Environment | Programming Languages and software packages used | +| References & resources | Cites any related research and possible additional resource (such as datasets etc) | + + + + + + From 002bfdd851593257334ac6c93ce3d1155d0b09ed Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 14:18:40 +0100 Subject: [PATCH 07/33] Add 'docs/data-provision/enrichment/cites.md' --- docs/data-provision/enrichment/cites.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) create mode 100644 docs/data-provision/enrichment/cites.md diff --git a/docs/data-provision/enrichment/cites.md b/docs/data-provision/enrichment/cites.md new file mode 100644 index 0000000..de458c9 --- /dev/null +++ b/docs/data-provision/enrichment/cites.md @@ -0,0 +1,24 @@ +--- +sidebar_position: 4 +--- + +# Mining algorithms +TEST HARRY +TODO + +| Short description | briefly describes the algorithm | +| --- | --- | +| Authority | describes the organisation and/ or the person responsible for the algorithm | +| Licence | describes the licensing and rights held on the algorithm | +| Algorithmic details | describes the algorithmic solution in more detail (i.e., the various concepts used in the algorithm, its iterations, etc.) | +| Parameters | describes the required algorithm parameters | +| Limitations | Mentions any limitation of the output | +| Code repository | the code repository of the algorithm | +| Environment | Programming Languages and software packages used | +| References & resources | Cites any related research and possible additional resource (such as datasets etc) | + + + + + + From 5fc503253756953f18845e3a62a56c198bc3f04f Mon Sep 17 00:00:00 2001 From: Yannis Foufoulas Date: Wed, 16 Nov 2022 14:44:46 +0100 Subject: [PATCH 08/33] Update 'docs/data-provision/enrichment/mining.md' --- docs/data-provision/enrichment/mining.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/data-provision/enrichment/mining.md b/docs/data-provision/enrichment/mining.md index bbf385b..75ffa0d 100644 --- a/docs/data-provision/enrichment/mining.md +++ b/docs/data-provision/enrichment/mining.md @@ -3,6 +3,10 @@ sidebar_position: 1 --- # Mining algorithms + +[a relative link](acks.md) +[a relative link](cites.md) + TEST HARRY TODO From 39d3f47fa0bf878391572bad21b602e73ce9c630 Mon Sep 17 00:00:00 2001 From: Yannis Foufoulas Date: Wed, 16 Nov 2022 14:45:41 +0100 Subject: [PATCH 09/33] Update 'docs/data-provision/enrichment/mining.md' --- docs/data-provision/enrichment/mining.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/data-provision/enrichment/mining.md b/docs/data-provision/enrichment/mining.md index 75ffa0d..2be0b63 100644 --- a/docs/data-provision/enrichment/mining.md +++ b/docs/data-provision/enrichment/mining.md @@ -4,8 +4,9 @@ sidebar_position: 1 # Mining algorithms -[a relative link](acks.md) -[a relative link](cites.md) +[Extraction of acknowledged concepts](acks.md) + +[Extraction of cited concepts](cites.md) TEST HARRY TODO From f933f541fef515eceb8d9bb34655fd2bc519bca5 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 14:58:12 +0100 Subject: [PATCH 10/33] Update 'docs/data-provision/enrichment/mining.md' --- docs/data-provision/enrichment/mining.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/data-provision/enrichment/mining.md b/docs/data-provision/enrichment/mining.md index 2be0b63..3bf053f 100644 --- a/docs/data-provision/enrichment/mining.md +++ b/docs/data-provision/enrichment/mining.md @@ -8,11 +8,11 @@ sidebar_position: 1 [Extraction of cited concepts](cites.md) -TEST HARRY TODO -| Short description | briefly describes the algorithm | +| Property | Description | | --- | --- | +| Short description | briefly describes the algorithm | | Authority | describes the organisation and/ or the person responsible for the algorithm | | Licence | describes the licensing and rights held on the algorithm | | Algorithmic details | describes the algorithmic solution in more detail (i.e., the various concepts used in the algorithm, its iterations, etc.) | From c2dbf0536be201d21c1323a3446e98cd158b9d57 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 14:59:02 +0100 Subject: [PATCH 11/33] Update 'docs/data-provision/enrichment/acks.md' --- docs/data-provision/enrichment/acks.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/data-provision/enrichment/acks.md b/docs/data-provision/enrichment/acks.md index 990e05a..8b544f1 100644 --- a/docs/data-provision/enrichment/acks.md +++ b/docs/data-provision/enrichment/acks.md @@ -2,11 +2,10 @@ sidebar_position: 3 --- -# Mining algorithms -TEST HARRY +# Acks algorithms TODO -| Short description | briefly describes the algorithm | +| Property | Description | | --- | --- | | Authority | describes the organisation and/ or the person responsible for the algorithm | | Licence | describes the licensing and rights held on the algorithm | From 6b48a13bc141792c73450a3dc3c9b0e3e7be1e30 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 14:59:39 +0100 Subject: [PATCH 12/33] Update 'docs/data-provision/enrichment/cites.md' --- docs/data-provision/enrichment/cites.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/data-provision/enrichment/cites.md b/docs/data-provision/enrichment/cites.md index de458c9..33dfa58 100644 --- a/docs/data-provision/enrichment/cites.md +++ b/docs/data-provision/enrichment/cites.md @@ -2,11 +2,10 @@ sidebar_position: 4 --- -# Mining algorithms -TEST HARRY +# Cites algorithms TODO -| Short description | briefly describes the algorithm | +| Property | Description | | --- | --- | | Authority | describes the organisation and/ or the person responsible for the algorithm | | Licence | describes the licensing and rights held on the algorithm | From ca9a8f75c381c92d67d2bb61e450bdce82d0f8a0 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 15:07:29 +0100 Subject: [PATCH 13/33] Update 'docs/data-provision/enrichment/acks.md' --- docs/data-provision/enrichment/acks.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/data-provision/enrichment/acks.md b/docs/data-provision/enrichment/acks.md index 8b544f1..04e4747 100644 --- a/docs/data-provision/enrichment/acks.md +++ b/docs/data-provision/enrichment/acks.md @@ -2,11 +2,12 @@ sidebar_position: 3 --- -# Acks algorithms +# Extraction of Acknowledged Concepts TODO | Property | Description | | --- | --- | +| Short description | briefly describes the algorithm | | Authority | describes the organisation and/ or the person responsible for the algorithm | | Licence | describes the licensing and rights held on the algorithm | | Algorithmic details | describes the algorithmic solution in more detail (i.e., the various concepts used in the algorithm, its iterations, etc.) | From c9228633ec96a483434913ee0ec0c8f889f7a392 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 15:17:49 +0100 Subject: [PATCH 14/33] Update 'docs/data-provision/enrichment/acks.md' Added a brief description --- docs/data-provision/enrichment/acks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data-provision/enrichment/acks.md b/docs/data-provision/enrichment/acks.md index 04e4747..5085151 100644 --- a/docs/data-provision/enrichment/acks.md +++ b/docs/data-provision/enrichment/acks.md @@ -7,7 +7,7 @@ sidebar_position: 3 | Property | Description | | --- | --- | -| Short description | briefly describes the algorithm | +| Short description | Scans the plaintexts of publications for acknowledged concepts, including grant identifiers (projects) of funders, accession numbers of bioetities, EPO patent mentions, as well as custom concepts that can link research objects to specific research communities, initiatives and infrastructures in OpenAIRE. | | Authority | describes the organisation and/ or the person responsible for the algorithm | | Licence | describes the licensing and rights held on the algorithm | | Algorithmic details | describes the algorithmic solution in more detail (i.e., the various concepts used in the algorithm, its iterations, etc.) | From 5dec33d26f838b16e446099a3451bf5173255181 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 15:22:35 +0100 Subject: [PATCH 15/33] Update 'docs/data-provision/enrichment/cites.md' added short description --- docs/data-provision/enrichment/cites.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/data-provision/enrichment/cites.md b/docs/data-provision/enrichment/cites.md index 33dfa58..57a432e 100644 --- a/docs/data-provision/enrichment/cites.md +++ b/docs/data-provision/enrichment/cites.md @@ -2,11 +2,12 @@ sidebar_position: 4 --- -# Cites algorithms +# Extraction of Cited Concepts TODO | Property | Description | | --- | --- | +| Short description | Scans the plaintexts of publications for cited concepts, including references to datasets and software URIs. | | Authority | describes the organisation and/ or the person responsible for the algorithm | | Licence | describes the licensing and rights held on the algorithm | | Algorithmic details | describes the algorithmic solution in more detail (i.e., the various concepts used in the algorithm, its iterations, etc.) | From 544808c7cd0cf1b1480861ae24466145c68cac5c Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 15:28:43 +0100 Subject: [PATCH 16/33] Update 'docs/data-provision/enrichment/acks.md' --- docs/data-provision/enrichment/acks.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/data-provision/enrichment/acks.md b/docs/data-provision/enrichment/acks.md index 5085151..2cea4ba 100644 --- a/docs/data-provision/enrichment/acks.md +++ b/docs/data-provision/enrichment/acks.md @@ -7,8 +7,8 @@ sidebar_position: 3 | Property | Description | | --- | --- | -| Short description | Scans the plaintexts of publications for acknowledged concepts, including grant identifiers (projects) of funders, accession numbers of bioetities, EPO patent mentions, as well as custom concepts that can link research objects to specific research communities, initiatives and infrastructures in OpenAIRE. | -| Authority | describes the organisation and/ or the person responsible for the algorithm | +| Short description | Scans the plaintexts of publications for acknowledged concepts, including grant identifiers (projects) of funders, accession numbers of bioetities, EPO patent mentions, as well as custom concepts that can link research objects to specific research communities and initiatives in OpenAIRE. | +| Authority | ATHENA Research Center, Greece | | Licence | describes the licensing and rights held on the algorithm | | Algorithmic details | describes the algorithmic solution in more detail (i.e., the various concepts used in the algorithm, its iterations, etc.) | | Parameters | describes the required algorithm parameters | From 4458952a2ed7efb0688c2023cddd1e7a24c39935 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 15:37:58 +0100 Subject: [PATCH 17/33] Update 'docs/data-provision/enrichment/cites.md' Added Reference and link to High-Pass Text Filtering paper --- docs/data-provision/enrichment/cites.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/data-provision/enrichment/cites.md b/docs/data-provision/enrichment/cites.md index 57a432e..5c023a5 100644 --- a/docs/data-provision/enrichment/cites.md +++ b/docs/data-provision/enrichment/cites.md @@ -7,15 +7,15 @@ sidebar_position: 4 | Property | Description | | --- | --- | -| Short description | Scans the plaintexts of publications for cited concepts, including references to datasets and software URIs. | -| Authority | describes the organisation and/ or the person responsible for the algorithm | +| Short description | Scans the plaintexts of publications for cited concepts, currently for references to datasets and software URIs. | +| Authority | ATHENA Research Center, Greece | | Licence | describes the licensing and rights held on the algorithm | | Algorithmic details | describes the algorithmic solution in more detail (i.e., the various concepts used in the algorithm, its iterations, etc.) | | Parameters | describes the required algorithm parameters | | Limitations | Mentions any limitation of the output | | Code repository | the code repository of the algorithm | | Environment | Programming Languages and software packages used | -| References & resources | Cites any related research and possible additional resource (such as datasets etc) | +| References & resources | [Foufoulas Y., Stamatogiannakis L., Dimitropoulos H., Ioannidis Y. (2017) “High-Pass Text Filtering for Citation Matching”. In: Kamps J., Tsakonas G., Manolopoulos Y., Iliadis L., Karydis I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science, vol 10450. Springer, Cham.](https://doi.org/10.1007/978-3-319-67008-9_28) | From d5dd2f6d0bf6ac1996781278501af631a99ed251 Mon Sep 17 00:00:00 2001 From: Yannis Foufoulas Date: Wed, 16 Nov 2022 16:13:15 +0100 Subject: [PATCH 18/33] Update 'docs/data-provision/enrichment/cites.md' --- docs/data-provision/enrichment/cites.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data-provision/enrichment/cites.md b/docs/data-provision/enrichment/cites.md index 5c023a5..25d2715 100644 --- a/docs/data-provision/enrichment/cites.md +++ b/docs/data-provision/enrichment/cites.md @@ -10,7 +10,7 @@ sidebar_position: 4 | Short description | Scans the plaintexts of publications for cited concepts, currently for references to datasets and software URIs. | | Authority | ATHENA Research Center, Greece | | Licence | describes the licensing and rights held on the algorithm | -| Algorithmic details | describes the algorithmic solution in more detail (i.e., the various concepts used in the algorithm, its iterations, etc.) | +| Algorithmic details | The algorithm extracts citations to specific datasets and software. It extracts the citation section of a publication's fulltext and applies string matching against a target database which includes an inverted index with dataset/software titles, urls and other metadata. | | Parameters | describes the required algorithm parameters | | Limitations | Mentions any limitation of the output | | Code repository | the code repository of the algorithm | From 0732dd5df65e07b69911cf072de3d95730b32459 Mon Sep 17 00:00:00 2001 From: Yannis Foufoulas Date: Wed, 16 Nov 2022 16:15:51 +0100 Subject: [PATCH 19/33] Update 'docs/data-provision/enrichment/cites.md' --- docs/data-provision/enrichment/cites.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data-provision/enrichment/cites.md b/docs/data-provision/enrichment/cites.md index 25d2715..0c2e43d 100644 --- a/docs/data-provision/enrichment/cites.md +++ b/docs/data-provision/enrichment/cites.md @@ -11,7 +11,7 @@ sidebar_position: 4 | Authority | ATHENA Research Center, Greece | | Licence | describes the licensing and rights held on the algorithm | | Algorithmic details | The algorithm extracts citations to specific datasets and software. It extracts the citation section of a publication's fulltext and applies string matching against a target database which includes an inverted index with dataset/software titles, urls and other metadata. | -| Parameters | describes the required algorithm parameters | +| Parameters | Title, URL, creator names, publisher names and publication year for each concept to create the target database. Identifier and publication's fulltext to extract the cited concepts. | | Limitations | Mentions any limitation of the output | | Code repository | the code repository of the algorithm | | Environment | Programming Languages and software packages used | From 44815cc8e1fa8a8efcdc798b770e584181065832 Mon Sep 17 00:00:00 2001 From: Yannis Foufoulas Date: Wed, 16 Nov 2022 16:19:14 +0100 Subject: [PATCH 20/33] Update 'docs/data-provision/enrichment/cites.md' --- docs/data-provision/enrichment/cites.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/data-provision/enrichment/cites.md b/docs/data-provision/enrichment/cites.md index 0c2e43d..a5337c5 100644 --- a/docs/data-provision/enrichment/cites.md +++ b/docs/data-provision/enrichment/cites.md @@ -12,9 +12,9 @@ sidebar_position: 4 | Licence | describes the licensing and rights held on the algorithm | | Algorithmic details | The algorithm extracts citations to specific datasets and software. It extracts the citation section of a publication's fulltext and applies string matching against a target database which includes an inverted index with dataset/software titles, urls and other metadata. | | Parameters | Title, URL, creator names, publisher names and publication year for each concept to create the target database. Identifier and publication's fulltext to extract the cited concepts. | -| Limitations | Mentions any limitation of the output | -| Code repository | the code repository of the algorithm | -| Environment | Programming Languages and software packages used | +| Limitations | N/A | +| Code repository | https://github.com/openaire/iis/tree/master/iis-wf/iis-wf-referenceextraction/src/main/resources/eu/dnetlib/iis/wf/referenceextraction | +| Environment | Python, madIS (https://github.com/madgik/madis), APSW (https://github.com/rogerbinns/apsw) | | References & resources | [Foufoulas Y., Stamatogiannakis L., Dimitropoulos H., Ioannidis Y. (2017) “High-Pass Text Filtering for Citation Matching”. In: Kamps J., Tsakonas G., Manolopoulos Y., Iliadis L., Karydis I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science, vol 10450. Springer, Cham.](https://doi.org/10.1007/978-3-319-67008-9_28) | From 163c5a6bca1fcdb427c3f55e1216ec50955fb161 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 16:20:45 +0100 Subject: [PATCH 21/33] Update 'docs/data-provision/enrichment/cites.md' --- docs/data-provision/enrichment/cites.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data-provision/enrichment/cites.md b/docs/data-provision/enrichment/cites.md index a5337c5..8ef05a7 100644 --- a/docs/data-provision/enrichment/cites.md +++ b/docs/data-provision/enrichment/cites.md @@ -9,7 +9,7 @@ sidebar_position: 4 | --- | --- | | Short description | Scans the plaintexts of publications for cited concepts, currently for references to datasets and software URIs. | | Authority | ATHENA Research Center, Greece | -| Licence | describes the licensing and rights held on the algorithm | +| Licence | CC-BY/CC-0 | | Algorithmic details | The algorithm extracts citations to specific datasets and software. It extracts the citation section of a publication's fulltext and applies string matching against a target database which includes an inverted index with dataset/software titles, urls and other metadata. | | Parameters | Title, URL, creator names, publisher names and publication year for each concept to create the target database. Identifier and publication's fulltext to extract the cited concepts. | | Limitations | N/A | From 45d3b152dcf9b7db78a7a9b66812308d7052f2ab Mon Sep 17 00:00:00 2001 From: Yannis Foufoulas Date: Wed, 16 Nov 2022 16:25:05 +0100 Subject: [PATCH 22/33] Update 'docs/data-provision/enrichment/acks.md' --- docs/data-provision/enrichment/acks.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/data-provision/enrichment/acks.md b/docs/data-provision/enrichment/acks.md index 2cea4ba..b859d0a 100644 --- a/docs/data-provision/enrichment/acks.md +++ b/docs/data-provision/enrichment/acks.md @@ -10,11 +10,11 @@ sidebar_position: 3 | Short description | Scans the plaintexts of publications for acknowledged concepts, including grant identifiers (projects) of funders, accession numbers of bioetities, EPO patent mentions, as well as custom concepts that can link research objects to specific research communities and initiatives in OpenAIRE. | | Authority | ATHENA Research Center, Greece | | Licence | describes the licensing and rights held on the algorithm | -| Algorithmic details | describes the algorithmic solution in more detail (i.e., the various concepts used in the algorithm, its iterations, etc.) | -| Parameters | describes the required algorithm parameters | -| Limitations | Mentions any limitation of the output | -| Code repository | the code repository of the algorithm | -| Environment | Programming Languages and software packages used | +| Algorithmic details | The algorithm processes the publication's fulltext and extracts references to acknowledged concepts. It applies pattern matching and string join between the fulltext and a target database which contains the title, the acronym and the identifier of the searched concept | +| Parameters | Concept titles,acronyms, and identifiers, publication's identifiers and fulltexts | +| Limitations | N/A | +| Code repository | https://github.com/openaire/iis/tree/master/iis-wf/iis-wf-referenceextraction/src/main/resources/eu/dnetlib/iis/wf/referenceextraction | +| Environment | Python, madIS (https://github.com/madgik/madis), APSW (https://github.com/rogerbinns/apsw) | | References & resources | Cites any related research and possible additional resource (such as datasets etc) | From 1f5856ecf41588d21c96069435859b06e51c247f Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 16:27:43 +0100 Subject: [PATCH 23/33] Add 'docs/data-provision/enrichment/classified.md' --- docs/data-provision/enrichment/classified.md | 24 ++++++++++++++++++++ 1 file changed, 24 insertions(+) create mode 100644 docs/data-provision/enrichment/classified.md diff --git a/docs/data-provision/enrichment/classified.md b/docs/data-provision/enrichment/classified.md new file mode 100644 index 0000000..134e046 --- /dev/null +++ b/docs/data-provision/enrichment/classified.md @@ -0,0 +1,24 @@ +--- +sidebar_position: 5 +--- + +# Classifiers +TODO + +| Property | Description | +| --- | --- | +| Short description | Classifiers | +| Authority | ATHENA Research Center, Greece | +| Licence | CC-BY/CC-0 | +| Algorithmic details | | +| Parameters | | +| Limitations | N/A | +| Code repository | | +| Environment | Python, madIS (https://github.com/madgik/madis), APSW (https://github.com/rogerbinns/apsw) | +| | + + + + + + From fcedfc1d9dfa050dcd151125e3f5621315705b06 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 16:31:37 +0100 Subject: [PATCH 24/33] Update 'docs/data-provision/enrichment/acks.md' --- docs/data-provision/enrichment/acks.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/data-provision/enrichment/acks.md b/docs/data-provision/enrichment/acks.md index b859d0a..94db2f3 100644 --- a/docs/data-provision/enrichment/acks.md +++ b/docs/data-provision/enrichment/acks.md @@ -9,9 +9,9 @@ sidebar_position: 3 | --- | --- | | Short description | Scans the plaintexts of publications for acknowledged concepts, including grant identifiers (projects) of funders, accession numbers of bioetities, EPO patent mentions, as well as custom concepts that can link research objects to specific research communities and initiatives in OpenAIRE. | | Authority | ATHENA Research Center, Greece | -| Licence | describes the licensing and rights held on the algorithm | +| Licence | CC-BY/CC-0 | | Algorithmic details | The algorithm processes the publication's fulltext and extracts references to acknowledged concepts. It applies pattern matching and string join between the fulltext and a target database which contains the title, the acronym and the identifier of the searched concept | -| Parameters | Concept titles,acronyms, and identifiers, publication's identifiers and fulltexts | +| Parameters | Concept titles, acronyms, and identifiers, publication's identifiers and fulltexts | | Limitations | N/A | | Code repository | https://github.com/openaire/iis/tree/master/iis-wf/iis-wf-referenceextraction/src/main/resources/eu/dnetlib/iis/wf/referenceextraction | | Environment | Python, madIS (https://github.com/madgik/madis), APSW (https://github.com/rogerbinns/apsw) | From aa35a239f37cd76e41de72e2cad9424ea2635521 Mon Sep 17 00:00:00 2001 From: Yannis Foufoulas Date: Wed, 16 Nov 2022 16:34:04 +0100 Subject: [PATCH 25/33] Update 'docs/data-provision/enrichment/classified.md' --- docs/data-provision/enrichment/classified.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/data-provision/enrichment/classified.md b/docs/data-provision/enrichment/classified.md index 134e046..ee617bc 100644 --- a/docs/data-provision/enrichment/classified.md +++ b/docs/data-provision/enrichment/classified.md @@ -10,10 +10,10 @@ sidebar_position: 5 | Short description | Classifiers | | Authority | ATHENA Research Center, Greece | | Licence | CC-BY/CC-0 | -| Algorithmic details | | -| Parameters | | +| Algorithmic details | The algorithm classifies publication's fulltexts using a Bayesian classifier and weighted terms according to an offline training phase. The training has been done using the following taxonomies: arxiv, dcc, acm | +| Parameters | Publication's identifier and fulltext | | Limitations | N/A | -| Code repository | | +| Code repository | https://github.com/openaire/iis/tree/master/iis-wf/iis-wf-referenceextraction/src/main/resources/eu/dnetlib/iis/wf/referenceextraction | | Environment | Python, madIS (https://github.com/madgik/madis), APSW (https://github.com/rogerbinns/apsw) | | | From e40fee8408efa86ced12c294dd7e53a5e2292f21 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 16:36:15 +0100 Subject: [PATCH 26/33] Update 'docs/data-provision/enrichment/acks.md' --- docs/data-provision/enrichment/acks.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/data-provision/enrichment/acks.md b/docs/data-provision/enrichment/acks.md index 94db2f3..8cbd75d 100644 --- a/docs/data-provision/enrichment/acks.md +++ b/docs/data-provision/enrichment/acks.md @@ -8,14 +8,14 @@ sidebar_position: 3 | Property | Description | | --- | --- | | Short description | Scans the plaintexts of publications for acknowledged concepts, including grant identifiers (projects) of funders, accession numbers of bioetities, EPO patent mentions, as well as custom concepts that can link research objects to specific research communities and initiatives in OpenAIRE. | -| Authority | ATHENA Research Center, Greece | +| Authority | ATHENA Research Center, Greece | | Licence | CC-BY/CC-0 | -| Algorithmic details | The algorithm processes the publication's fulltext and extracts references to acknowledged concepts. It applies pattern matching and string join between the fulltext and a target database which contains the title, the acronym and the identifier of the searched concept | +| Algorithmic details | The algorithm processes the publication's fulltext and extracts references to acknowledged concepts. It applies pattern matching and string join between the fulltext and a target database which contains the title, the acronym and the identifier of the searched concept. | | Parameters | Concept titles, acronyms, and identifiers, publication's identifiers and fulltexts | | Limitations | N/A | | Code repository | https://github.com/openaire/iis/tree/master/iis-wf/iis-wf-referenceextraction/src/main/resources/eu/dnetlib/iis/wf/referenceextraction | | Environment | Python, madIS (https://github.com/madgik/madis), APSW (https://github.com/rogerbinns/apsw) | -| References & resources | Cites any related research and possible additional resource (such as datasets etc) | +| References & resources | [Foufoulas, Y., Zacharia, E., Dimitropoulos, H., Manola, N., Ioannidis, Y. (2022). DETEXA: Declarative Extensible Text Exploration and Analysis. In: , et al. Linking Theory and Practice of Digital Libraries. TPDL 2022. Lecture Notes in Computer Science, vol 13541. Springer, Cham.](https://doi.org/10.1007/978-3-031-16802-4_9) | From 8fda5c81cfc54e5231f5712b7f25edd6d2aeb1ed Mon Sep 17 00:00:00 2001 From: Yannis Foufoulas Date: Wed, 16 Nov 2022 16:36:42 +0100 Subject: [PATCH 27/33] Update 'docs/data-provision/enrichment/mining.md' --- docs/data-provision/enrichment/mining.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/data-provision/enrichment/mining.md b/docs/data-provision/enrichment/mining.md index 3bf053f..9a94005 100644 --- a/docs/data-provision/enrichment/mining.md +++ b/docs/data-provision/enrichment/mining.md @@ -8,6 +8,8 @@ sidebar_position: 1 [Extraction of cited concepts](cites.md) +[Document Classification](classified.md) + TODO | Property | Description | From 8f9184146c4210664fe2131bf14b4b61c7efdf0d Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 16:41:42 +0100 Subject: [PATCH 28/33] Update 'docs/data-provision/enrichment/classified.md' --- docs/data-provision/enrichment/classified.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/data-provision/enrichment/classified.md b/docs/data-provision/enrichment/classified.md index ee617bc..ae78e17 100644 --- a/docs/data-provision/enrichment/classified.md +++ b/docs/data-provision/enrichment/classified.md @@ -7,10 +7,10 @@ sidebar_position: 5 | Property | Description | | --- | --- | -| Short description | Classifiers | +| Short description | A document classification algorithm that employs analysis of free text stemming from the abstracts of the publications. The purpose of applying a document classification module is to assign a scientific text to one or more predefined content classes. | | Authority | ATHENA Research Center, Greece | | Licence | CC-BY/CC-0 | -| Algorithmic details | The algorithm classifies publication's fulltexts using a Bayesian classifier and weighted terms according to an offline training phase. The training has been done using the following taxonomies: arxiv, dcc, acm | +| Algorithmic details | The algorithm classifies publication's fulltexts using a Bayesian classifier and weighted terms according to an offline training phase. The training has been done using the following taxonomies: arXiv, MeSH (Medical Subject Headings), ACM, and DDC (Dewey Decimal Classification, or Dewey Decimal System). | | Parameters | Publication's identifier and fulltext | | Limitations | N/A | | Code repository | https://github.com/openaire/iis/tree/master/iis-wf/iis-wf-referenceextraction/src/main/resources/eu/dnetlib/iis/wf/referenceextraction | From 2d75ea529ff0fc5eb677ebf62bc6df312355008f Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 16:48:47 +0100 Subject: [PATCH 29/33] Update 'docs/data-provision/enrichment/classified.md' --- docs/data-provision/enrichment/classified.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data-provision/enrichment/classified.md b/docs/data-provision/enrichment/classified.md index ae78e17..4b1608a 100644 --- a/docs/data-provision/enrichment/classified.md +++ b/docs/data-provision/enrichment/classified.md @@ -15,7 +15,7 @@ sidebar_position: 5 | Limitations | N/A | | Code repository | https://github.com/openaire/iis/tree/master/iis-wf/iis-wf-referenceextraction/src/main/resources/eu/dnetlib/iis/wf/referenceextraction | | Environment | Python, madIS (https://github.com/madgik/madis), APSW (https://github.com/rogerbinns/apsw) | -| | +| References & resources | [Giannakopoulos, T., Stamatogiannakis, E., Foufoulas, I., Dimitropoulos, H., Manola, N., Ioannidis, Y. (2014). Content Visualization of Scientific Corpora Using an Extensible Relational Database Implementation. In: Bolikowski, Ł., Casarosa, V., Goodale, P., Houssos, N., Manghi, P., Schirrwagen, J. (eds) Theory and Practice of Digital Libraries -- TPDL 2013 Selected Workshops. TPDL 2013. Communications in Computer and Information Science, vol 416. Springer, Cham.](https://doi.org/10.1007/978-3-319-08425-1_10) | From a48f5a263d1a3a72fb95d92bb406e8e4512c6e76 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 16:54:28 +0100 Subject: [PATCH 30/33] Update 'docs/data-provision/enrichment/mining.md' --- docs/data-provision/enrichment/mining.md | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/docs/data-provision/enrichment/mining.md b/docs/data-provision/enrichment/mining.md index 9a94005..efc75c1 100644 --- a/docs/data-provision/enrichment/mining.md +++ b/docs/data-provision/enrichment/mining.md @@ -4,6 +4,8 @@ sidebar_position: 1 # Mining algorithms +The Text and Data Mining (TDM) algorithms used for enriching the OpenAIRE Graph are grouped in the following main categories: + [Extraction of acknowledged concepts](acks.md) [Extraction of cited concepts](cites.md) @@ -12,18 +14,6 @@ sidebar_position: 1 TODO -| Property | Description | -| --- | --- | -| Short description | briefly describes the algorithm | -| Authority | describes the organisation and/ or the person responsible for the algorithm | -| Licence | describes the licensing and rights held on the algorithm | -| Algorithmic details | describes the algorithmic solution in more detail (i.e., the various concepts used in the algorithm, its iterations, etc.) | -| Parameters | describes the required algorithm parameters | -| Limitations | Mentions any limitation of the output | -| Code repository | the code repository of the algorithm | -| Environment | Programming Languages and software packages used | -| References & resources | Cites any related research and possible additional resource (such as datasets etc) | - From 96c7a6d87cc7df492dfe1fa59c7aa1066bdbd5c3 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 16:56:02 +0100 Subject: [PATCH 31/33] Update 'docs/data-provision/enrichment/acks.md' --- docs/data-provision/enrichment/acks.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/data-provision/enrichment/acks.md b/docs/data-provision/enrichment/acks.md index 8cbd75d..80c67a3 100644 --- a/docs/data-provision/enrichment/acks.md +++ b/docs/data-provision/enrichment/acks.md @@ -3,7 +3,6 @@ sidebar_position: 3 --- # Extraction of Acknowledged Concepts -TODO | Property | Description | | --- | --- | From e562936a1837ee5feaf6e939d8269be5fd43cb81 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 16:56:16 +0100 Subject: [PATCH 32/33] Update 'docs/data-provision/enrichment/cites.md' --- docs/data-provision/enrichment/cites.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/data-provision/enrichment/cites.md b/docs/data-provision/enrichment/cites.md index 8ef05a7..f3c4734 100644 --- a/docs/data-provision/enrichment/cites.md +++ b/docs/data-provision/enrichment/cites.md @@ -3,7 +3,6 @@ sidebar_position: 4 --- # Extraction of Cited Concepts -TODO | Property | Description | | --- | --- | From 8cddb71098f4c2a6b31230199b799f6c86c32fa3 Mon Sep 17 00:00:00 2001 From: Harry Dimitropoulos Date: Wed, 16 Nov 2022 16:56:42 +0100 Subject: [PATCH 33/33] Update 'docs/data-provision/enrichment/classifies.md' --- docs/data-provision/enrichment/{classified.md => classifies.md} | 1 - 1 file changed, 1 deletion(-) rename docs/data-provision/enrichment/{classified.md => classifies.md} (97%) diff --git a/docs/data-provision/enrichment/classified.md b/docs/data-provision/enrichment/classifies.md similarity index 97% rename from docs/data-provision/enrichment/classified.md rename to docs/data-provision/enrichment/classifies.md index 4b1608a..e97ebc8 100644 --- a/docs/data-provision/enrichment/classified.md +++ b/docs/data-provision/enrichment/classifies.md @@ -3,7 +3,6 @@ sidebar_position: 5 --- # Classifiers -TODO | Property | Description | | --- | --- |