forked from D-Net/openaire-graph-docs
Merge pull request 'Add formating to impact indicators page' (#9) from impact_indicators into main
Reviewed-on: D-Net/openaire-graph-docs#9
This commit is contained in:
commit
96912ea7ec
|
@ -646,7 +646,12 @@ A measure computed for this instance (e.g. those provided by [BIP! Finder](https
|
||||||
### key
|
### key
|
||||||
_Type: String • Cardinality: ONE_
|
_Type: String • Cardinality: ONE_
|
||||||
|
|
||||||
The specified measure. Currently supported one of: `{ influence, influence_alt, popularity, popularity_alt, impulse, cc }` (see [the dedicated page](../../data-provision/enrichment/impact-scores) for more details).
|
The specified measure. Currently supported one of:
|
||||||
|
* `influence` (see [PageRank](/data-provision/enrichment/impact-scores#pagerank-pr))
|
||||||
|
* `influence_alt` (see [Citation Count](/data-provision/enrichment/impact-scores#citation-count-cc))
|
||||||
|
* `popularity` (see [AttRank](/data-provision/enrichment/impact-scores#attrank))
|
||||||
|
* `popularity_alt` (see [RAM](/data-provision/enrichment/impact-scores#ram))
|
||||||
|
* `impulse` (see ["Incubation" Citation Count](/data-provision/enrichment/impact-scores#incubation-citation-count-icc))
|
||||||
|
|
||||||
```json
|
```json
|
||||||
"key": "influence"
|
"key": "influence"
|
||||||
|
|
|
@ -2,30 +2,74 @@
|
||||||
sidebar_position: 2
|
sidebar_position: 2
|
||||||
---
|
---
|
||||||
|
|
||||||
# Impact scores
|
# Impact indicators
|
||||||
<span className="todo">TODO - add intro</span>
|
|
||||||
|
This page summarises all calculated impact indicators, which are included into the [measure](/data-model/entities/other#measure) property.
|
||||||
|
It should be noted that the impact indicators are being calculated both on the level of the research output as well on the level of distinct DOIs.
|
||||||
|
Below we explain their main intuition, the way they are calculated, and their most important limitations, in an attempt help avoiding common pitfalls and misuses.
|
||||||
|
|
||||||
|
|
||||||
## Citation Count (CC)
|
## Citation Count (CC)
|
||||||
|
|
||||||
This is the most widely used scientific impact indicator, which sums all citations received by each article. The citation count of a
|
***Short description:***
|
||||||
publication $i$ corresponds to the in-degree of the corresponding node in the underlying citation network: $s_i = \sum_{j} A_{i,j}$,
|
This is the most widely used scientific impact indicator, which sums all citations received by each article.
|
||||||
where $A$ is the adjacency matrix of the network (i.e., $A_{i,j}=1$ when paper $j$ cites paper $i$, while $A_{i,j}=0$ otherwise).
|
|
||||||
Citation count can be viewed as a measure of a publication's overall impact, since it conveys the number of other works that directly
|
Citation count can be viewed as a measure of a publication's overall impact, since it conveys the number of other works that directly
|
||||||
drew on it.
|
drew on it.
|
||||||
|
|
||||||
|
***Algorithmic details:***
|
||||||
|
The citation count of a
|
||||||
|
publication $i$ corresponds to the in-degree of the corresponding node in the underlying citation network: $s_i = \sum_{j} A_{i,j}$,
|
||||||
|
where $A$ is the adjacency matrix of the network (i.e., $A_{i,j}=1$ when paper $j$ cites paper $i$, while $A_{i,j}=0$ otherwise).
|
||||||
|
|
||||||
|
***Parameters:*** -
|
||||||
|
|
||||||
|
***Limitations:***
|
||||||
|
OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator.
|
||||||
|
Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source.
|
||||||
|
|
||||||
|
***Environment:*** PySpark
|
||||||
|
|
||||||
|
***References:*** -
|
||||||
|
|
||||||
|
***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker)
|
||||||
|
|
||||||
|
|
||||||
## "Incubation" Citation Count (iCC)
|
## "Incubation" Citation Count (iCC)
|
||||||
|
|
||||||
|
***Short description:***
|
||||||
This measure is essentially a time-restricted version of the citation count, where the time window is distinct for each paper, i.e.,
|
This measure is essentially a time-restricted version of the citation count, where the time window is distinct for each paper, i.e.,
|
||||||
only citations $y$ years after its publication are counted (usually, $y=3$). The "incubation" citation count of a paper $i$ is
|
only citations $y$ years after its publication are counted.
|
||||||
calculated as: $s_i = \sum_{j,t_j \leq t_i+3} A_{i,j}$, where $A$ is the adjacency matrix and $t_j, t_i$ are the citing and cited paper's
|
|
||||||
|
***Algorithmic details:***
|
||||||
|
The "incubation" citation count of a paper $i$ is
|
||||||
|
calculated as: $s_i = \sum_{j,t_j \leq t_i+y} A_{i,j}$, where $A$ is the adjacency matrix and $t_j, t_i$ are the citing and cited paper's
|
||||||
publication years, respectively. $t_i$ is cited paper $i$'s publication year. iCC can be seen as an indicator of a paper's initial momentum
|
publication years, respectively. $t_i$ is cited paper $i$'s publication year. iCC can be seen as an indicator of a paper's initial momentum
|
||||||
(impulse) directly after its publication.
|
(impulse) directly after its publication.
|
||||||
|
|
||||||
## PageRank (PR)
|
***Parameters:***
|
||||||
|
$y=3$
|
||||||
|
|
||||||
|
***Limitations:***
|
||||||
|
OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator.
|
||||||
|
Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source.
|
||||||
|
|
||||||
|
***Environment:*** PySpark
|
||||||
|
|
||||||
|
***References:***
|
||||||
|
* Vergoulis, T., Kanellos, I., Atzori, C., Mannocci, A., Chatzopoulos, S., Bruzzo, S. L., Manola, N., & Manghi, P. (2021, April). Bip! db: A dataset of impact measures for scientific publications. In Companion Proceedings of the Web Conference 2021 (pp. 456-460).
|
||||||
|
|
||||||
|
***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker)
|
||||||
|
|
||||||
|
|
||||||
|
## PageRank (PR)
|
||||||
|
|
||||||
|
***Short description:***
|
||||||
Originally developed to rank Web pages, PageRank has been also widely used to rank publications in citation
|
Originally developed to rank Web pages, PageRank has been also widely used to rank publications in citation
|
||||||
networks. In this latter context, a publication's PageRank
|
networks. In this latter context, a publication's PageRank
|
||||||
score also serves as a measure of its influence. In particular, the PageRank score of a publication is calculated
|
score also serves as a measure of its influence.
|
||||||
|
|
||||||
|
***Algorithmic details:***
|
||||||
|
The PageRank score of a publication is calculated
|
||||||
as its probability of being read by a researcher that either randomly selects publications to read or selects
|
as its probability of being read by a researcher that either randomly selects publications to read or selects
|
||||||
publications based on the references of her latest read. Formally, the score of a publication $i$ is given by:
|
publications based on the references of her latest read. Formally, the score of a publication $i$ is given by:
|
||||||
|
|
||||||
|
@ -41,12 +85,31 @@ score of each publication relies of the score of publications citing it (the alg
|
||||||
until all scores converge). As a result, PageRank differentiates citations based on the importance of citing
|
until all scores converge). As a result, PageRank differentiates citations based on the importance of citing
|
||||||
articles, thus alleviating the corresponding issue of the Citation Count.
|
articles, thus alleviating the corresponding issue of the Citation Count.
|
||||||
|
|
||||||
|
***Parameters:***
|
||||||
|
$\alpha = 0.5, convergence\_error = 10^{-12}$
|
||||||
|
|
||||||
|
***Limitations:***
|
||||||
|
OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator.
|
||||||
|
Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source.
|
||||||
|
|
||||||
|
***Environment:*** PySpark
|
||||||
|
|
||||||
|
***References:***
|
||||||
|
* Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.
|
||||||
|
|
||||||
|
***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker)
|
||||||
|
|
||||||
|
|
||||||
## RAM
|
## RAM
|
||||||
|
|
||||||
RAM is essentially a modified Citation Count, where recent citations are considered of higher importance compared
|
***Short description:***
|
||||||
to older ones. Hence, it better captures the popularity of publications. This "time-awareness" of citations
|
RAM is essentially a modified Citation Count, where recent citations are considered of higher importance compared to older ones.
|
||||||
|
Hence, it better captures the popularity of publications. This "time-awareness" of citations
|
||||||
alleviates the bias of methods like Citation Count and PageRank against recently published articles, which have
|
alleviates the bias of methods like Citation Count and PageRank against recently published articles, which have
|
||||||
not had "enough" time to gather as many citations. The RAM score of each paper $i$ is calculated as follows:
|
not had "enough" time to gather as many citations.
|
||||||
|
|
||||||
|
***Algorithmic details:***
|
||||||
|
The RAM score of each paper $i$ is calculated as follows:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
s_i = \sum_j{R_{i,j}}
|
s_i = \sum_j{R_{i,j}}
|
||||||
|
@ -56,11 +119,30 @@ where $R$ is the so-called Retained Adjacency Matrix (RAM) and $R_{i,j}=\gamma^{
|
||||||
$i$, and $R_{i,j}=0$ otherwise. Parameter $\gamma \in (0,1)$, $t_c$ corresponds to the current year and $t_j$ corresponds to the
|
$i$, and $R_{i,j}=0$ otherwise. Parameter $\gamma \in (0,1)$, $t_c$ corresponds to the current year and $t_j$ corresponds to the
|
||||||
publication year of citing article $j$.
|
publication year of citing article $j$.
|
||||||
|
|
||||||
|
***Parameters:***
|
||||||
|
$\gamma = 0.6$
|
||||||
|
|
||||||
|
***Limitations:***
|
||||||
|
OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator.
|
||||||
|
Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source.
|
||||||
|
|
||||||
|
***Environment:*** PySpark
|
||||||
|
|
||||||
|
***References:***
|
||||||
|
* Ghosh, R., Kuo, T. T., Hsu, C. N., Lin, S. D., & Lerman, K. (2011, December). Time-aware ranking in dynamic citation networks. In 2011 ieee 11^{th} international conference on data mining workshops (pp. 373-380). IEEE.
|
||||||
|
|
||||||
|
***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker)
|
||||||
|
|
||||||
|
|
||||||
## AttRank
|
## AttRank
|
||||||
|
|
||||||
|
***Short description:***
|
||||||
AttRank is a PageRank variant that alleviates its bias against recent publications (i.e., it is tailored to capture popularity).
|
AttRank is a PageRank variant that alleviates its bias against recent publications (i.e., it is tailored to capture popularity).
|
||||||
AttRank achieves this by modifying PageRank's probability of randomly selecting a publication. Instead of using a uniform probability,
|
AttRank achieves this by modifying PageRank's probability of randomly selecting a publication. Instead of using a uniform probability,
|
||||||
AttRank defines it based on a combination of the publication's age and the citations it received in recent years. The AttRank score
|
AttRank defines it based on a combination of the publication's age and the citations it received in recent years.
|
||||||
|
|
||||||
|
***Algorithmic details:***
|
||||||
|
The AttRank score
|
||||||
of each publication $i$ is calculated based on:
|
of each publication $i$ is calculated based on:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
|
@ -70,4 +152,22 @@ $$
|
||||||
|
|
||||||
where $\alpha + \beta + \gamma =1$ and $\alpha,\beta,\gamma \in [0,1]$. $Att(i)$ denotes a recent attention-based score for publication $i$,
|
where $\alpha + \beta + \gamma =1$ and $\alpha,\beta,\gamma \in [0,1]$. $Att(i)$ denotes a recent attention-based score for publication $i$,
|
||||||
which reflects its share of citations in the $y$ most recent years, $t_i$ is the publication year of article $i$, $t_c$ denotes the current
|
which reflects its share of citations in the $y$ most recent years, $t_i$ is the publication year of article $i$, $t_c$ denotes the current
|
||||||
year, and $c$ is a normalisation constant. Finally, $P$ is the stochastic transition matrix.
|
year, and $c$ is a normalisation constant. Finally, $P$ is the stochastic transition matrix.
|
||||||
|
|
||||||
|
***Parameters:***
|
||||||
|
$\alpha = 0.2, \beta = 0.5, \gamma = 0.3, \rho = -0.16, convergence\_error = 10^-{12}$
|
||||||
|
|
||||||
|
Note that recent attention is based on the 3 most recent years (including current one).
|
||||||
|
|
||||||
|
***Limitations:***
|
||||||
|
OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator.
|
||||||
|
Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source.
|
||||||
|
|
||||||
|
***Environment:*** PySpark
|
||||||
|
|
||||||
|
***References:***
|
||||||
|
* Kanellos, I., Vergoulis, T., Sacharidis, D., Dalamagas, T., & Vassiliou, Y. (2021, April). Ranking papers by their short-term scientific impact. In 2021 IEEE 37th International Conference on Data Engineering (ICDE) (pp. 1997-2002). IEEE.
|
||||||
|
|
||||||
|
***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker)
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue