integrating Scholexplorer Bio Entity Datasource documentation (PR#48)
This commit is contained in:
parent
49fc73d09d
commit
fb5e6cd814
|
@ -60,3 +60,6 @@ When tagging a new version, the document versioning mechanism will:
|
||||||
* Copy the full `docs/` folder contents into a new `versioned_docs/version-<versionName>/` folder.
|
* Copy the full `docs/` folder contents into a new `versioned_docs/version-<versionName>/` folder.
|
||||||
* Create a versioned sidebars file based from your current sidebar configuration, saved as `versioned_sidebars/version-<versionName>-sidebars.json`.
|
* Create a versioned sidebars file based from your current sidebar configuration, saved as `versioned_sidebars/version-<versionName>-sidebars.json`.
|
||||||
* Append the new version number to `versions.json`.
|
* Append the new version number to `versions.json`.
|
||||||
|
|
||||||
|
Therefore, when previewing the compiled site locally with `npm run start`, ensure to visualise the `Next` version on the browser as it shows the changes under `/docs`.
|
||||||
|
To change a version that was already versioned, the source files to be modified are in the `versioned_docs/version-<versionName>/` folder.
|
||||||
|
|
|
@ -26,6 +26,9 @@ _Start Date: 2023-02-13 • Release Date: 2023-03-01 • Dump release: **n
|
||||||
|
|
||||||
- Revised SDG classification: improved coverage (+600K classified DOIs)
|
- Revised SDG classification: improved coverage (+600K classified DOIs)
|
||||||
- General increase of the funded scientific outputs, thanks to the full text mining scanning new OpenAccess publications
|
- General increase of the funded scientific outputs, thanks to the full text mining scanning new OpenAccess publications
|
||||||
|
- Integrated contents from
|
||||||
|
- [EMBL-EBIs Protein Data Bank in Europe](/data-provision/aggregation/non-compatible-sources/ebi)
|
||||||
|
- [UniProtKB/Swiss-Prot](/data-provision/aggregation/non-compatible-sources/uniprot)
|
||||||
|
|
||||||
#### Changed
|
#### Changed
|
||||||
|
|
||||||
|
|
|
@ -1,32 +1,31 @@
|
||||||
# UniProtKB/Swiss-Prot
|
# UniProtKB/Swiss-Prot
|
||||||
|
|
||||||
this section describes the mapping implemented for [UniProtKB/Swiss-Prot](https://www.uniprot.org/).
|
This section describes the mapping implemented to integrate metadata and links from [UniProtKB/Swiss-Prot](https://www.uniprot.org/).
|
||||||
The whole dump can be downloaded by [here](https://www.uniprot.org/help/downloadss) the Reviewed (Swiss-Prot).
|
The complete data dump "Reviewed (Swiss-Prot)" can be downloaded from [here](https://www.uniprot.org/help/downloads).
|
||||||
|
|
||||||
From this Dump we extract only the protein linked to a pubmed Publication.
|
|
||||||
|
|
||||||
|
From this dataset, only the protein records linked to a PubMed publication are extracted.
|
||||||
|
|
||||||
## Entity Mapping
|
## Entity Mapping
|
||||||
|
|
||||||
The table below describes the mapping from the TEXT metadata format to the OpenAIRE Graph dump format.
|
The table below describes the mapping from the TEXT metadata format to the OpenAIRE Graph dump format.
|
||||||
You can check an example of the text metadata [here](https://rest.uniprot.org/uniprotkb/A0A0C5B5G6.txt)
|
You can check an example of the text metadata [here](https://rest.uniprot.org/uniprotkb/A0A0C5B5G6.txt)
|
||||||
|
|
||||||
| OpenAIRE Result field path | FASTA record field xpath| Notes|
|
| OpenAIRE Result field path | FASTA record field xpath | Notes |
|
||||||
|--------------------------------|----------------------|---------|
|
|------------------------------|--------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
|
||||||
| **BIOEntity Mapping** | | |
|
| **BIOEntity Mapping** | | |
|
||||||
| `id` | `LINE Starts with AC` | id in the form `uniprot_____::md5(id)`|
|
| `id` | `LINE Starts with AC` | id in the form `uniprot_____::md5(id)` |
|
||||||
| `pid` | `LINE Starts with AC` | example `AC A0A0C5B5G6;` classid=classname=`uniprot` the vaue is the text after `AC` |
|
| `pid` | `LINE Starts with AC` | example `AC A0A0C5B5G6;` classid=classname=`uniprot` the vaue is the text after `AC` |
|
||||||
| `publicationdate` | `LINE START WITH DT containg text integrated into UniProtKB/Swiss-Prot` | clean and normalize the format of the date to be `YYYY-mm-dd` |
|
| `publicationdate` | `LINE START WITH DT containg text integrated into UniProtKB/Swiss-Prot` | clean and normalize the format of the date to be `YYYY-mm-dd` |
|
||||||
| `maintitle` | `LINE START WITH GN`|main title |
|
| `maintitle` | `LINE START WITH GN` | main title |
|
||||||
| **Instance Mapping** | | |
|
| **Instance Mapping** | | |
|
||||||
| `instance.type` | | `Bioentity` |
|
| `instance.type` | | `Bioentity` |
|
||||||
| `type` | | `Dataset` |
|
| `type` | | `Dataset` |
|
||||||
| `instance.pid` | `LINE Starts with AC` | `classid = classname = uniprot` |
|
| `instance.pid` | `LINE Starts with AC` | `classid = classname = uniprot` |
|
||||||
| `instance.url` | `pid` | prepend to the value `https://www.uniprot.org/uniprot/`|
|
| `instance.url` | `pid` | prepend to the value `https://www.uniprot.org/uniprot/` |
|
||||||
| `instance.publicationdate` | `LINE START WITH DT containg text integrated into UniProtKB/Swiss-Prot` | clean and normalize the format of the date to be YYYY-mm-dd |
|
| `instance.publicationdate` | `LINE START WITH DT containg text integrated into UniProtKB/Swiss-Prot` | clean and normalize the format of the date to be YYYY-mm-dd |
|
||||||
|
|
||||||
|
|
||||||
### Relation Mapping
|
### Relation Mapping
|
||||||
| OpenAIRE Relation Semantic and inverse | Source/Target type | Notes |
|
| OpenAIRE Relation Semantic and inverse | Source/Target type | Notes |
|
||||||
|----------------------------------------|---------------------|--------------------------------------------------------------------------|
|
|----------------------------------------|----------------------|--------------------------------------------------------------------------------------------------------------------------|
|
||||||
| `IsRelatedTo` | `LINE START WITH RX` | we create relationships between the BioEntity and the pubmed or DOI generating an unresolved target identifier |
|
| `IsRelatedTo` | `LINE START WITH RX` | the mapping creates relationships between the BioEntity and the PubMed or DOI generating an unresolved target identifier |
|
|
@ -88,8 +88,7 @@ const sidebars = {
|
||||||
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/pubmed' },
|
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/pubmed' },
|
||||||
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/datacite' },
|
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/datacite' },
|
||||||
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/ebi', label: 'EMBL-EBI' },
|
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/ebi', label: 'EMBL-EBI' },
|
||||||
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/uniprot', label: 'UniProtKB/Swiss-Prot' },
|
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/uniprot', label: 'UniProtKB/Swiss-Prot' }
|
||||||
|
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
|
|
|
@ -26,6 +26,9 @@ _Start Date: 2023-02-13 • Release Date: 2023-03-01 • Dump release: **n
|
||||||
|
|
||||||
- Revised SDG classification: improved coverage (+600K classified DOIs)
|
- Revised SDG classification: improved coverage (+600K classified DOIs)
|
||||||
- General increase of the funded scientific outputs, thanks to the full text mining scanning new OpenAccess publications
|
- General increase of the funded scientific outputs, thanks to the full text mining scanning new OpenAccess publications
|
||||||
|
- Integrated contents from
|
||||||
|
- [EMBL-EBIs Protein Data Bank in Europe](/data-provision/aggregation/non-compatible-sources/ebi)
|
||||||
|
- [UniProtKB/Swiss-Prot](/data-provision/aggregation/non-compatible-sources/uniprot)
|
||||||
|
|
||||||
#### Changed
|
#### Changed
|
||||||
|
|
||||||
|
|
|
@ -18,6 +18,10 @@ Such a policy defines a list of data sources that are considered authoritative f
|
||||||
| doi | [Crossref](https://www.crossref.org), [Datacite](https://datacite.org) |
|
| doi | [Crossref](https://www.crossref.org), [Datacite](https://datacite.org) |
|
||||||
| pmc, pmid | [Europe PubMed Central](https://europepmc.org/), [PubMed Central](https://www.ncbi.nlm.nih.gov/pmc) |
|
| pmc, pmid | [Europe PubMed Central](https://europepmc.org/), [PubMed Central](https://www.ncbi.nlm.nih.gov/pmc) |
|
||||||
| arXiv | [arXiv.org e-Print Archive](https://arxiv.org/) |
|
| arXiv | [arXiv.org e-Print Archive](https://arxiv.org/) |
|
||||||
|
| uniprot | [Protein Data Bank](http://www.pdb.org/) |
|
||||||
|
| ena | [Protein Data Bank](http://www.pdb.org/) |
|
||||||
|
| pdb | [Protein Data Bank](http://www.pdb.org/) |
|
||||||
|
|
||||||
|
|
||||||
There is an exception though: Handle(s) are minted by several repositories; as listing them all would not be a viable option, to avoid losing them as PIDs, Handles bypass the PID authority filtering rule.
|
There is an exception though: Handle(s) are minted by several repositories; as listing them all would not be a viable option, to avoid losing them as PIDs, Handles bypass the PID authority filtering rule.
|
||||||
In all other cases, PIDs are be included in the graph as alternate Identifiers.
|
In all other cases, PIDs are be included in the graph as alternate Identifiers.
|
||||||
|
@ -63,12 +67,15 @@ When the record is collected from a source which is not authoritative for any ty
|
||||||
Currently, the following data sources are used as "PID authorities":
|
Currently, the following data sources are used as "PID authorities":
|
||||||
|
|
||||||
| PID Type | Prefix (12 chars) | Authority |
|
| PID Type | Prefix (12 chars) | Authority |
|
||||||
|-----------|------------------------|-----------------------------------------|
|
|-----------|------------------------|-------------------------------------------|
|
||||||
| doi | `doi_________` | Crossref, Datacite, Zenodo |
|
| doi | `doi_________` | Crossref, Datacite, Zenodo |
|
||||||
| pmc | `pmc_________` | Europe PubMed Central, PubMed Central |
|
| pmc | `pmc_________` | Europe PubMed Central, PubMed Central |
|
||||||
| pmid | `pmid________` | Europe PubMed Central, PubMed Central |
|
| pmid | `pmid________` | Europe PubMed Central, PubMed Central |
|
||||||
| arXiv | `arXiv_______` | arXiv.org e-Print Archive |
|
| arXiv | `arXiv_______` | arXiv.org e-Print Archive |
|
||||||
| handle | `handle______` | any repository |
|
| handle | `handle______` | any repository |
|
||||||
|
| ena | `ena_________` | EMBL-EBI |
|
||||||
|
| pdb | `pdb_________` | EMBL-EBI |
|
||||||
|
| uniprot | `uniprot_____` | EMBL-EBI |
|
||||||
|
|
||||||
OpenAIRE also perform duplicate identification (see the [dedicated section for details](/data-provision/deduplication)).
|
OpenAIRE also perform duplicate identification (see the [dedicated section for details](/data-provision/deduplication)).
|
||||||
All duplicates are **merged** together in a **representative record** which must be assigned a dedicated OpenAIRE identifier (i.e. it cannot have the identifier of one of the aggregated record).
|
All duplicates are **merged** together in a **representative record** which must be assigned a dedicated OpenAIRE identifier (i.e. it cannot have the identifier of one of the aggregated record).
|
||||||
|
|
|
@ -0,0 +1,31 @@
|
||||||
|
# UniProtKB/Swiss-Prot
|
||||||
|
|
||||||
|
This section describes the mapping implemented to integrate metadata and links from [UniProtKB/Swiss-Prot](https://www.uniprot.org/).
|
||||||
|
The complete data dump "Reviewed (Swiss-Prot)" can be downloaded from [here](https://www.uniprot.org/help/downloads).
|
||||||
|
|
||||||
|
From this dataset, only the protein records linked to a PubMed publication are extracted.
|
||||||
|
|
||||||
|
## Entity Mapping
|
||||||
|
|
||||||
|
The table below describes the mapping from the TEXT metadata format to the OpenAIRE Graph dump format.
|
||||||
|
You can check an example of the text metadata [here](https://rest.uniprot.org/uniprotkb/A0A0C5B5G6.txt)
|
||||||
|
|
||||||
|
| OpenAIRE Result field path | FASTA record field xpath | Notes |
|
||||||
|
|------------------------------|--------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
|
||||||
|
| **BIOEntity Mapping** | | |
|
||||||
|
| `id` | `LINE Starts with AC` | id in the form `uniprot_____::md5(id)` |
|
||||||
|
| `pid` | `LINE Starts with AC` | example `AC A0A0C5B5G6;` classid=classname=`uniprot` the vaue is the text after `AC` |
|
||||||
|
| `publicationdate` | `LINE START WITH DT containg text integrated into UniProtKB/Swiss-Prot` | clean and normalize the format of the date to be `YYYY-mm-dd` |
|
||||||
|
| `maintitle` | `LINE START WITH GN` | main title |
|
||||||
|
| **Instance Mapping** | | |
|
||||||
|
| `instance.type` | | `Bioentity` |
|
||||||
|
| `type` | | `Dataset` |
|
||||||
|
| `instance.pid` | `LINE Starts with AC` | `classid = classname = uniprot` |
|
||||||
|
| `instance.url` | `pid` | prepend to the value `https://www.uniprot.org/uniprot/` |
|
||||||
|
| `instance.publicationdate` | `LINE START WITH DT containg text integrated into UniProtKB/Swiss-Prot` | clean and normalize the format of the date to be YYYY-mm-dd |
|
||||||
|
|
||||||
|
|
||||||
|
### Relation Mapping
|
||||||
|
| OpenAIRE Relation Semantic and inverse | Source/Target type | Notes |
|
||||||
|
|----------------------------------------|----------------------|--------------------------------------------------------------------------------------------------------------------------|
|
||||||
|
| `IsRelatedTo` | `LINE START WITH RX` | the mapping creates relationships between the BioEntity and the PubMed or DOI generating an unresolved target identifier |
|
|
@ -128,6 +128,11 @@
|
||||||
"type": "doc",
|
"type": "doc",
|
||||||
"id": "data-provision/aggregation/non-compatible-sources/ebi",
|
"id": "data-provision/aggregation/non-compatible-sources/ebi",
|
||||||
"label": "EMBL-EBI"
|
"label": "EMBL-EBI"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "doc",
|
||||||
|
"id": "data-provision/aggregation/non-compatible-sources/uniprot",
|
||||||
|
"label": "UniProtKB/Swiss-Prot"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
Loading…
Reference in New Issue