openaire-graph-docs/docs/graph-production-workflow/aggregation/non-compatible-sources/ebi.md

4.7 KiB

EMBL-EBIs Protein Data Bank in Europe

This section describes the mapping implemented for EMBL-EBIs Protein Data Bank in Europe.

The Europe PMC RESTful Web Service gives the datalinks API to retrieve data-literature links in Scholix format.

How the data is collected

Starting from the Pubmed collection, the API below is used to obtain the bioentities related to publications for each PubMed identifier.

Example:

curl -s "https://www.ebi.ac.uk/europepmc/webservices/rest/MED/33024307/datalinks?format=json" | jq '.'
{
  "version": "6.8",
  "hitCount": 9,
  "request": {
    "id": "33024307",
    "source": "MED"
  },
  "dataLinkList": {
    "Category": [
      {
        "Name": "Nucleotide Sequences",
        "CategoryLinkCount": 5,
        "Section": [
          {
            "ObtainedBy": "tm_accession",
            "Tags": [
              "supporting_data"
            ],
            "SectionLinkCount": 5,
            "Linklist": {
              "Link": [
                {
                  "ObtainedBy": "tm_accession",
                  "PublicationDate": "04-11-2022",
                  "LinkProvider": {
                    "Name": "Europe PMC"
                  },
                  "RelationshipType": {
                    "Name": "References"
                  },
                  "Source": {
                    "Type": {
                      "Name": "literature"
                    },
                    "Identifier": {
                      "ID": "33024307",
                      "IDScheme": "MED"
                    }
                  },
                  "Target": {
                    "Type": {
                      "Name": "dataset"
                    },
                    "Identifier": {
                      "ID": "AY278488",
                      "IDScheme": "ENA",
                      "IDURL": "http://identifiers.org/ebi/ena.embl:AY278488"
                    },
                    "Title": "AY278488",
                    "Publisher": {
                      "Name": "Europe PMC"
                    }
                  },
            [...]

Mapping

The table below describes the mapping from the EBI links records to the OpenAIRE Graph dump format. We filter all the target links with pid type ena, pdb or uniprot For each target we construct a Bioentity with the following mapping

OpenAIRE Result field path EBI record field xpath Notes
id target/identifier/ID and target/identifier/IDScheme id in the form SCHEMA_________::md5(pid)
pid target/identifier/ID and target/identifier/IDScheme classid = classname = schema
publicationdate target/PublicationDate clean and normalize the format of the date to be YYYY-mm-dd
maintitle target/Title
Instance Mapping
instance.type Bioentity
type Dataset
instance.pid target/identifier/ID and target/identifier/IDScheme classid = classname = schema
instance.url target/identifier/IDURL Copy the value as it is
instance.publicationdate //PubmedPubDate clean and normalize the format of the date to be YYYY-mm-dd

Relation Mapping

OpenAIRE Relation Semantic and inverse Source/Target type Notes
IsRelatedTo result/result we create relationships between the BioEntity and the pubmed publication