forked from D-Net/openaire-graph-docs
Merge pull request 'Add formating to impact indicators page' (#9) from impact_indicators into main
Reviewed-on: D-Net/openaire-graph-docs#9
This commit is contained in:
commit
96912ea7ec
|
@ -646,7 +646,12 @@ A measure computed for this instance (e.g. those provided by [BIP! Finder](https
|
|||
### key
|
||||
_Type: String • Cardinality: ONE_
|
||||
|
||||
The specified measure. Currently supported one of: `{ influence, influence_alt, popularity, popularity_alt, impulse, cc }` (see [the dedicated page](../../data-provision/enrichment/impact-scores) for more details).
|
||||
The specified measure. Currently supported one of:
|
||||
* `influence` (see [PageRank](/data-provision/enrichment/impact-scores#pagerank-pr))
|
||||
* `influence_alt` (see [Citation Count](/data-provision/enrichment/impact-scores#citation-count-cc))
|
||||
* `popularity` (see [AttRank](/data-provision/enrichment/impact-scores#attrank))
|
||||
* `popularity_alt` (see [RAM](/data-provision/enrichment/impact-scores#ram))
|
||||
* `impulse` (see ["Incubation" Citation Count](/data-provision/enrichment/impact-scores#incubation-citation-count-icc))
|
||||
|
||||
```json
|
||||
"key": "influence"
|
||||
|
|
|
@ -2,30 +2,74 @@
|
|||
sidebar_position: 2
|
||||
---
|
||||
|
||||
# Impact scores
|
||||
<span className="todo">TODO - add intro</span>
|
||||
# Impact indicators
|
||||
|
||||
This page summarises all calculated impact indicators, which are included into the [measure](/data-model/entities/other#measure) property.
|
||||
It should be noted that the impact indicators are being calculated both on the level of the research output as well on the level of distinct DOIs.
|
||||
Below we explain their main intuition, the way they are calculated, and their most important limitations, in an attempt help avoiding common pitfalls and misuses.
|
||||
|
||||
|
||||
## Citation Count (CC)
|
||||
|
||||
This is the most widely used scientific impact indicator, which sums all citations received by each article. The citation count of a
|
||||
publication $i$ corresponds to the in-degree of the corresponding node in the underlying citation network: $s_i = \sum_{j} A_{i,j}$,
|
||||
where $A$ is the adjacency matrix of the network (i.e., $A_{i,j}=1$ when paper $j$ cites paper $i$, while $A_{i,j}=0$ otherwise).
|
||||
***Short description:***
|
||||
This is the most widely used scientific impact indicator, which sums all citations received by each article.
|
||||
Citation count can be viewed as a measure of a publication's overall impact, since it conveys the number of other works that directly
|
||||
drew on it.
|
||||
|
||||
***Algorithmic details:***
|
||||
The citation count of a
|
||||
publication $i$ corresponds to the in-degree of the corresponding node in the underlying citation network: $s_i = \sum_{j} A_{i,j}$,
|
||||
where $A$ is the adjacency matrix of the network (i.e., $A_{i,j}=1$ when paper $j$ cites paper $i$, while $A_{i,j}=0$ otherwise).
|
||||
|
||||
***Parameters:*** -
|
||||
|
||||
***Limitations:***
|
||||
OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator.
|
||||
Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source.
|
||||
|
||||
***Environment:*** PySpark
|
||||
|
||||
***References:*** -
|
||||
|
||||
***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker)
|
||||
|
||||
|
||||
## "Incubation" Citation Count (iCC)
|
||||
|
||||
***Short description:***
|
||||
This measure is essentially a time-restricted version of the citation count, where the time window is distinct for each paper, i.e.,
|
||||
only citations $y$ years after its publication are counted (usually, $y=3$). The "incubation" citation count of a paper $i$ is
|
||||
calculated as: $s_i = \sum_{j,t_j \leq t_i+3} A_{i,j}$, where $A$ is the adjacency matrix and $t_j, t_i$ are the citing and cited paper's
|
||||
only citations $y$ years after its publication are counted.
|
||||
|
||||
***Algorithmic details:***
|
||||
The "incubation" citation count of a paper $i$ is
|
||||
calculated as: $s_i = \sum_{j,t_j \leq t_i+y} A_{i,j}$, where $A$ is the adjacency matrix and $t_j, t_i$ are the citing and cited paper's
|
||||
publication years, respectively. $t_i$ is cited paper $i$'s publication year. iCC can be seen as an indicator of a paper's initial momentum
|
||||
(impulse) directly after its publication.
|
||||
|
||||
## PageRank (PR)
|
||||
***Parameters:***
|
||||
$y=3$
|
||||
|
||||
***Limitations:***
|
||||
OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator.
|
||||
Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source.
|
||||
|
||||
***Environment:*** PySpark
|
||||
|
||||
***References:***
|
||||
* Vergoulis, T., Kanellos, I., Atzori, C., Mannocci, A., Chatzopoulos, S., Bruzzo, S. L., Manola, N., & Manghi, P. (2021, April). Bip! db: A dataset of impact measures for scientific publications. In Companion Proceedings of the Web Conference 2021 (pp. 456-460).
|
||||
|
||||
***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker)
|
||||
|
||||
|
||||
## PageRank (PR)
|
||||
|
||||
***Short description:***
|
||||
Originally developed to rank Web pages, PageRank has been also widely used to rank publications in citation
|
||||
networks. In this latter context, a publication's PageRank
|
||||
score also serves as a measure of its influence. In particular, the PageRank score of a publication is calculated
|
||||
score also serves as a measure of its influence.
|
||||
|
||||
***Algorithmic details:***
|
||||
The PageRank score of a publication is calculated
|
||||
as its probability of being read by a researcher that either randomly selects publications to read or selects
|
||||
publications based on the references of her latest read. Formally, the score of a publication $i$ is given by:
|
||||
|
||||
|
@ -41,12 +85,31 @@ score of each publication relies of the score of publications citing it (the alg
|
|||
until all scores converge). As a result, PageRank differentiates citations based on the importance of citing
|
||||
articles, thus alleviating the corresponding issue of the Citation Count.
|
||||
|
||||
***Parameters:***
|
||||
$\alpha = 0.5, convergence\_error = 10^{-12}$
|
||||
|
||||
***Limitations:***
|
||||
OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator.
|
||||
Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source.
|
||||
|
||||
***Environment:*** PySpark
|
||||
|
||||
***References:***
|
||||
* Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.
|
||||
|
||||
***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker)
|
||||
|
||||
|
||||
## RAM
|
||||
|
||||
RAM is essentially a modified Citation Count, where recent citations are considered of higher importance compared
|
||||
to older ones. Hence, it better captures the popularity of publications. This "time-awareness" of citations
|
||||
***Short description:***
|
||||
RAM is essentially a modified Citation Count, where recent citations are considered of higher importance compared to older ones.
|
||||
Hence, it better captures the popularity of publications. This "time-awareness" of citations
|
||||
alleviates the bias of methods like Citation Count and PageRank against recently published articles, which have
|
||||
not had "enough" time to gather as many citations. The RAM score of each paper $i$ is calculated as follows:
|
||||
not had "enough" time to gather as many citations.
|
||||
|
||||
***Algorithmic details:***
|
||||
The RAM score of each paper $i$ is calculated as follows:
|
||||
|
||||
$$
|
||||
s_i = \sum_j{R_{i,j}}
|
||||
|
@ -56,11 +119,30 @@ where $R$ is the so-called Retained Adjacency Matrix (RAM) and $R_{i,j}=\gamma^{
|
|||
$i$, and $R_{i,j}=0$ otherwise. Parameter $\gamma \in (0,1)$, $t_c$ corresponds to the current year and $t_j$ corresponds to the
|
||||
publication year of citing article $j$.
|
||||
|
||||
***Parameters:***
|
||||
$\gamma = 0.6$
|
||||
|
||||
***Limitations:***
|
||||
OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator.
|
||||
Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source.
|
||||
|
||||
***Environment:*** PySpark
|
||||
|
||||
***References:***
|
||||
* Ghosh, R., Kuo, T. T., Hsu, C. N., Lin, S. D., & Lerman, K. (2011, December). Time-aware ranking in dynamic citation networks. In 2011 ieee 11^{th} international conference on data mining workshops (pp. 373-380). IEEE.
|
||||
|
||||
***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker)
|
||||
|
||||
|
||||
## AttRank
|
||||
|
||||
***Short description:***
|
||||
AttRank is a PageRank variant that alleviates its bias against recent publications (i.e., it is tailored to capture popularity).
|
||||
AttRank achieves this by modifying PageRank's probability of randomly selecting a publication. Instead of using a uniform probability,
|
||||
AttRank defines it based on a combination of the publication's age and the citations it received in recent years. The AttRank score
|
||||
AttRank defines it based on a combination of the publication's age and the citations it received in recent years.
|
||||
|
||||
***Algorithmic details:***
|
||||
The AttRank score
|
||||
of each publication $i$ is calculated based on:
|
||||
|
||||
$$
|
||||
|
@ -70,4 +152,22 @@ $$
|
|||
|
||||
where $\alpha + \beta + \gamma =1$ and $\alpha,\beta,\gamma \in [0,1]$. $Att(i)$ denotes a recent attention-based score for publication $i$,
|
||||
which reflects its share of citations in the $y$ most recent years, $t_i$ is the publication year of article $i$, $t_c$ denotes the current
|
||||
year, and $c$ is a normalisation constant. Finally, $P$ is the stochastic transition matrix.
|
||||
year, and $c$ is a normalisation constant. Finally, $P$ is the stochastic transition matrix.
|
||||
|
||||
***Parameters:***
|
||||
$\alpha = 0.2, \beta = 0.5, \gamma = 0.3, \rho = -0.16, convergence\_error = 10^-{12}$
|
||||
|
||||
Note that recent attention is based on the 3 most recent years (including current one).
|
||||
|
||||
***Limitations:***
|
||||
OpenAIRE collects data from specific data sources which means that part of the existing literature may not be considered when computing this indicator.
|
||||
Also, since some indicators require the publication year for their calculation, we consider only research products for which we can gather this information from at least one data source.
|
||||
|
||||
***Environment:*** PySpark
|
||||
|
||||
***References:***
|
||||
* Kanellos, I., Vergoulis, T., Sacharidis, D., Dalamagas, T., & Vassiliou, Y. (2021, April). Ranking papers by their short-term scientific impact. In 2021 IEEE 37th International Conference on Data Engineering (ICDE) (pp. 1997-2002). IEEE.
|
||||
|
||||
***Authority:*** ATHENA RC • ***License:*** GPL-2.0 • ***Code:*** [BIP! Ranker](https://github.com/athenarc/Bip-Ranker)
|
||||
|
||||
|
Loading…
Reference in New Issue