forked from D-Net/openaire-graph-docs
95 lines
4.7 KiB
Markdown
95 lines
4.7 KiB
Markdown
|
# EMBL-EBIs Protein Data Bank in Europe
|
||
|
|
||
|
This section describes the mapping implemented for [EMBL-EBIs Protein Data Bank in Europe](https://www.ebi.ac.uk/).
|
||
|
|
||
|
The Europe PMC RESTful Web Service gives the [datalinks API](https://europepmc.org/RestfulWebService#!/Europe32PMC32Articles32RESTful32API) to retrieve data-literature links in Scholix format.
|
||
|
|
||
|
## How the data is collected
|
||
|
|
||
|
Starting from the Pubmed collection, the API below is used to obtain the bioentities related to publications for each PubMed identifier.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
```commandline
|
||
|
curl -s "https://www.ebi.ac.uk/europepmc/webservices/rest/MED/33024307/datalinks?format=json" | jq '.'
|
||
|
{
|
||
|
"version": "6.8",
|
||
|
"hitCount": 9,
|
||
|
"request": {
|
||
|
"id": "33024307",
|
||
|
"source": "MED"
|
||
|
},
|
||
|
"dataLinkList": {
|
||
|
"Category": [
|
||
|
{
|
||
|
"Name": "Nucleotide Sequences",
|
||
|
"CategoryLinkCount": 5,
|
||
|
"Section": [
|
||
|
{
|
||
|
"ObtainedBy": "tm_accession",
|
||
|
"Tags": [
|
||
|
"supporting_data"
|
||
|
],
|
||
|
"SectionLinkCount": 5,
|
||
|
"Linklist": {
|
||
|
"Link": [
|
||
|
{
|
||
|
"ObtainedBy": "tm_accession",
|
||
|
"PublicationDate": "04-11-2022",
|
||
|
"LinkProvider": {
|
||
|
"Name": "Europe PMC"
|
||
|
},
|
||
|
"RelationshipType": {
|
||
|
"Name": "References"
|
||
|
},
|
||
|
"Source": {
|
||
|
"Type": {
|
||
|
"Name": "literature"
|
||
|
},
|
||
|
"Identifier": {
|
||
|
"ID": "33024307",
|
||
|
"IDScheme": "MED"
|
||
|
}
|
||
|
},
|
||
|
"Target": {
|
||
|
"Type": {
|
||
|
"Name": "dataset"
|
||
|
},
|
||
|
"Identifier": {
|
||
|
"ID": "AY278488",
|
||
|
"IDScheme": "ENA",
|
||
|
"IDURL": "http://identifiers.org/ebi/ena.embl:AY278488"
|
||
|
},
|
||
|
"Title": "AY278488",
|
||
|
"Publisher": {
|
||
|
"Name": "Europe PMC"
|
||
|
}
|
||
|
},
|
||
|
[...]
|
||
|
```
|
||
|
|
||
|
## Mapping
|
||
|
The table below describes the mapping from the EBI links records to the OpenAIRE Research Graph dump format.
|
||
|
We filter all the target links with pid type **ena**, **pdb** or **uniprot**
|
||
|
For each target we construct a Bioentity with the following mapping
|
||
|
|
||
|
|
||
|
| OpenAIRE Result field path | EBI record field xpath | Notes |
|
||
|
|-----------------------------|----------------------------------------------------------|---------------------------------------------------------------|
|
||
|
| `id` | `target/identifier/ID` and `target/identifier/IDScheme` | id in the form `SCHEMA_________::md5(pid)` |
|
||
|
| `pid` | `target/identifier/ID` and `target/identifier/IDScheme` | `classid = classname = schema` |
|
||
|
| `publicationdate` | `target/PublicationDate` | clean and normalize the format of the date to be `YYYY-mm-dd` |
|
||
|
| `maintitle` | `target/Title` | |
|
||
|
| **Instance Mapping** | | |
|
||
|
| `instance.type` | | `Bioentity` |
|
||
|
| `type` | | `Dataset` |
|
||
|
| `instance.pid` | `target/identifier/ID` and `target/identifier/IDScheme` | `classid = classname = schema` |
|
||
|
| `instance.url` | `target/identifier/IDURL` | Copy the value as it is |
|
||
|
| `instance.publicationdate` | `//PubmedPubDate` | clean and normalize the format of the date to be YYYY-mm-dd |
|
||
|
|
||
|
|
||
|
### Relation Mapping
|
||
|
| OpenAIRE Relation Semantic and inverse | Source/Target type | Notes |
|
||
|
|----------------------------------------|---------------------|--------------------------------------------------------------------------|
|
||
|
| `IsRelatedTo` | `result/result` | we create relationships between the BioEntity and the pubmed publication |
|