3.8 KiB
3.8 KiB
Microsoft Academic Graph
Data acquisition
The Microsoft Academic Graph dataset is generated from the latest released version of the graph, 06-12-2021.
Changes from the previous version
- New workflow: MAG is no longer created within the DOIBoost process. Now, a new workflow normalizes the various MAG tables into a single table, from which the action set is generated.
- MAG discontinued: It is important to note that MAG has been finished. Therefore, normalization only occurs once data is imported from a complete dump of MAG.
Process
The Microsoft Academic Graph (MAG) is a heterogeneous graph that contains scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study. The MAG schema is designed to capture the rich and complex relationships between these entities.
The main node types in the MAG schema are:
Paper
: Publications represent works of scientific research, such as articles, books, and book chapters.PaperAbstractsInvertedIndex
: used to map the paper abstractsAuthors
: Authors represent the people who wrote the publications. Institutions: Institutions represent the organizations with which the authors are affiliated.Journals
: Journals represent the periodical series in which the publications are published.Conferences
: Conferences represent the academic meetings in which the publications are presented.
The main edge types in the MAG schema are:
Citation relationships
: Citation relationships connect citing publications to cited publications.Affiliation relationships
: Affiliation relationships connect authors to the institutions with which they are affiliated.
Preprocess
In the first phase, a normalized table is defined containing all papers and associated relationships.
Mapping MAG properties into the OpenAIRE Graph
Properties in OpenAIRE research products are set based on the logic described in the following table:
OpenAIRE Research Product field path | MAG path(s) | Notes |
---|---|---|
id |
PaperId |
id in the form mag_________::md5(PaperId) |
instance.alternateIdentifier[@type = DOI] |
Doi |
DOI intersected with Crossref. Only MAG papers with a DOI present in Crossref are filtered |
instance.instancetype |
DocType |
Using the dnet:result_typologies vocabulary, we look up the DocType synonym to generate one of the following main entities:
|
maintitle |
OriginalTitle |
|
publicationdate |
Year |
publication date if Date is not available |
publicationdate |
Date |
|
publicationdate |
OnlineDate |
Date the article was put online |
publisher |
Publisher |
|
journal.name |
ConferenceName |
|
journal.issnPrinted |
JournalISSN |
|
journal.edition |
JournalPublisher |
|
journal.ConferencePlace |
ConferenceLocation |
|
journal.conferencedate |
ConferenceStartDate , ConferenceEndDate |
conference date as an append of conferencestartdate-conferenceenddate |
journal.vol |
Volume |
|
journal.iss |
Issue |
|
journal.sp |
FirstPage |
|
journal.ep |
LastPage |
|
abstract |
Paper abstract |
|
Author Mapping | ||
author.fullname |
AuthorName |
|
organization.legalname |
AffiliationName |
|
organization.id |
AffiliationId |
id in the form mag_________::md5(AffiliationId) |
organization.id |
AffiliationId |
for each affiliation we generate an affiliation relation between paper and organization |
author.pid[@type = mag] |
AuthorId |
|
author.rank |
AuthorSequenceNumber |
|
organization.pid |
GridId |