integrating Scholexplorer Bio Entity Datasource documentation (PR#48)
This commit is contained in:
parent
49fc73d09d
commit
fb5e6cd814
|
@ -60,3 +60,6 @@ When tagging a new version, the document versioning mechanism will:
|
|||
* Copy the full `docs/` folder contents into a new `versioned_docs/version-<versionName>/` folder.
|
||||
* Create a versioned sidebars file based from your current sidebar configuration, saved as `versioned_sidebars/version-<versionName>-sidebars.json`.
|
||||
* Append the new version number to `versions.json`.
|
||||
|
||||
Therefore, when previewing the compiled site locally with `npm run start`, ensure to visualise the `Next` version on the browser as it shows the changes under `/docs`.
|
||||
To change a version that was already versioned, the source files to be modified are in the `versioned_docs/version-<versionName>/` folder.
|
||||
|
|
|
@ -26,6 +26,9 @@ _Start Date: 2023-02-13 • Release Date: 2023-03-01 • Dump release: **n
|
|||
|
||||
- Revised SDG classification: improved coverage (+600K classified DOIs)
|
||||
- General increase of the funded scientific outputs, thanks to the full text mining scanning new OpenAccess publications
|
||||
- Integrated contents from
|
||||
- [EMBL-EBIs Protein Data Bank in Europe](/data-provision/aggregation/non-compatible-sources/ebi)
|
||||
- [UniProtKB/Swiss-Prot](/data-provision/aggregation/non-compatible-sources/uniprot)
|
||||
|
||||
#### Changed
|
||||
|
||||
|
|
|
@ -1,32 +1,31 @@
|
|||
# UniProtKB/Swiss-Prot
|
||||
|
||||
this section describes the mapping implemented for [UniProtKB/Swiss-Prot](https://www.uniprot.org/).
|
||||
The whole dump can be downloaded by [here](https://www.uniprot.org/help/downloadss) the Reviewed (Swiss-Prot).
|
||||
|
||||
From this Dump we extract only the protein linked to a pubmed Publication.
|
||||
This section describes the mapping implemented to integrate metadata and links from [UniProtKB/Swiss-Prot](https://www.uniprot.org/).
|
||||
The complete data dump "Reviewed (Swiss-Prot)" can be downloaded from [here](https://www.uniprot.org/help/downloads).
|
||||
|
||||
From this dataset, only the protein records linked to a PubMed publication are extracted.
|
||||
|
||||
## Entity Mapping
|
||||
|
||||
The table below describes the mapping from the TEXT metadata format to the OpenAIRE Graph dump format.
|
||||
You can check an example of the text metadata [here](https://rest.uniprot.org/uniprotkb/A0A0C5B5G6.txt)
|
||||
|
||||
| OpenAIRE Result field path | FASTA record field xpath| Notes|
|
||||
|--------------------------------|----------------------|---------|
|
||||
| **BIOEntity Mapping** | | |
|
||||
| `id` | `LINE Starts with AC` | id in the form `uniprot_____::md5(id)`|
|
||||
| `pid` | `LINE Starts with AC` | example `AC A0A0C5B5G6;` classid=classname=`uniprot` the vaue is the text after `AC` |
|
||||
| `publicationdate` | `LINE START WITH DT containg text integrated into UniProtKB/Swiss-Prot` | clean and normalize the format of the date to be `YYYY-mm-dd` |
|
||||
| `maintitle` | `LINE START WITH GN`|main title |
|
||||
| **Instance Mapping** | | |
|
||||
| `instance.type` | | `Bioentity` |
|
||||
| `type` | | `Dataset` |
|
||||
| `instance.pid` | `LINE Starts with AC` | `classid = classname = uniprot` |
|
||||
| `instance.url` | `pid` | prepend to the value `https://www.uniprot.org/uniprot/`|
|
||||
| `instance.publicationdate` | `LINE START WITH DT containg text integrated into UniProtKB/Swiss-Prot` | clean and normalize the format of the date to be YYYY-mm-dd |
|
||||
| OpenAIRE Result field path | FASTA record field xpath | Notes |
|
||||
|------------------------------|--------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
|
||||
| **BIOEntity Mapping** | | |
|
||||
| `id` | `LINE Starts with AC` | id in the form `uniprot_____::md5(id)` |
|
||||
| `pid` | `LINE Starts with AC` | example `AC A0A0C5B5G6;` classid=classname=`uniprot` the vaue is the text after `AC` |
|
||||
| `publicationdate` | `LINE START WITH DT containg text integrated into UniProtKB/Swiss-Prot` | clean and normalize the format of the date to be `YYYY-mm-dd` |
|
||||
| `maintitle` | `LINE START WITH GN` | main title |
|
||||
| **Instance Mapping** | | |
|
||||
| `instance.type` | | `Bioentity` |
|
||||
| `type` | | `Dataset` |
|
||||
| `instance.pid` | `LINE Starts with AC` | `classid = classname = uniprot` |
|
||||
| `instance.url` | `pid` | prepend to the value `https://www.uniprot.org/uniprot/` |
|
||||
| `instance.publicationdate` | `LINE START WITH DT containg text integrated into UniProtKB/Swiss-Prot` | clean and normalize the format of the date to be YYYY-mm-dd |
|
||||
|
||||
|
||||
### Relation Mapping
|
||||
| OpenAIRE Relation Semantic and inverse | Source/Target type | Notes |
|
||||
|----------------------------------------|---------------------|--------------------------------------------------------------------------|
|
||||
| `IsRelatedTo` | `LINE START WITH RX` | we create relationships between the BioEntity and the pubmed or DOI generating an unresolved target identifier |
|
||||
| OpenAIRE Relation Semantic and inverse | Source/Target type | Notes |
|
||||
|----------------------------------------|----------------------|--------------------------------------------------------------------------------------------------------------------------|
|
||||
| `IsRelatedTo` | `LINE START WITH RX` | the mapping creates relationships between the BioEntity and the PubMed or DOI generating an unresolved target identifier |
|
|
@ -88,8 +88,7 @@ const sidebars = {
|
|||
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/pubmed' },
|
||||
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/datacite' },
|
||||
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/ebi', label: 'EMBL-EBI' },
|
||||
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/uniprot', label: 'UniProtKB/Swiss-Prot' },
|
||||
|
||||
{ type: 'doc', id: 'data-provision/aggregation/non-compatible-sources/uniprot', label: 'UniProtKB/Swiss-Prot' }
|
||||
]
|
||||
}
|
||||
]
|
||||
|
|
|
@ -26,6 +26,9 @@ _Start Date: 2023-02-13 • Release Date: 2023-03-01 • Dump release: **n
|
|||
|
||||
- Revised SDG classification: improved coverage (+600K classified DOIs)
|
||||
- General increase of the funded scientific outputs, thanks to the full text mining scanning new OpenAccess publications
|
||||
- Integrated contents from
|
||||
- [EMBL-EBIs Protein Data Bank in Europe](/data-provision/aggregation/non-compatible-sources/ebi)
|
||||
- [UniProtKB/Swiss-Prot](/data-provision/aggregation/non-compatible-sources/uniprot)
|
||||
|
||||
#### Changed
|
||||
|
||||
|
|
|
@ -18,6 +18,10 @@ Such a policy defines a list of data sources that are considered authoritative f
|
|||
| doi | [Crossref](https://www.crossref.org), [Datacite](https://datacite.org) |
|
||||
| pmc, pmid | [Europe PubMed Central](https://europepmc.org/), [PubMed Central](https://www.ncbi.nlm.nih.gov/pmc) |
|
||||
| arXiv | [arXiv.org e-Print Archive](https://arxiv.org/) |
|
||||
| uniprot | [Protein Data Bank](http://www.pdb.org/) |
|
||||
| ena | [Protein Data Bank](http://www.pdb.org/) |
|
||||
| pdb | [Protein Data Bank](http://www.pdb.org/) |
|
||||
|
||||
|
||||
There is an exception though: Handle(s) are minted by several repositories; as listing them all would not be a viable option, to avoid losing them as PIDs, Handles bypass the PID authority filtering rule.
|
||||
In all other cases, PIDs are be included in the graph as alternate Identifiers.
|
||||
|
@ -63,12 +67,15 @@ When the record is collected from a source which is not authoritative for any ty
|
|||
Currently, the following data sources are used as "PID authorities":
|
||||
|
||||
| PID Type | Prefix (12 chars) | Authority |
|
||||
|-----------|------------------------|-----------------------------------------|
|
||||
|-----------|------------------------|-------------------------------------------|
|
||||
| doi | `doi_________` | Crossref, Datacite, Zenodo |
|
||||
| pmc | `pmc_________` | Europe PubMed Central, PubMed Central |
|
||||
| pmid | `pmid________` | Europe PubMed Central, PubMed Central |
|
||||
| arXiv | `arXiv_______` | arXiv.org e-Print Archive |
|
||||
| handle | `handle______` | any repository |
|
||||
| ena | `ena_________` | EMBL-EBI |
|
||||
| pdb | `pdb_________` | EMBL-EBI |
|
||||
| uniprot | `uniprot_____` | EMBL-EBI |
|
||||
|
||||
OpenAIRE also perform duplicate identification (see the [dedicated section for details](/data-provision/deduplication)).
|
||||
All duplicates are **merged** together in a **representative record** which must be assigned a dedicated OpenAIRE identifier (i.e. it cannot have the identifier of one of the aggregated record).
|
||||
|
|
|
@ -0,0 +1,31 @@
|
|||
# UniProtKB/Swiss-Prot
|
||||
|
||||
This section describes the mapping implemented to integrate metadata and links from [UniProtKB/Swiss-Prot](https://www.uniprot.org/).
|
||||
The complete data dump "Reviewed (Swiss-Prot)" can be downloaded from [here](https://www.uniprot.org/help/downloads).
|
||||
|
||||
From this dataset, only the protein records linked to a PubMed publication are extracted.
|
||||
|
||||
## Entity Mapping
|
||||
|
||||
The table below describes the mapping from the TEXT metadata format to the OpenAIRE Graph dump format.
|
||||
You can check an example of the text metadata [here](https://rest.uniprot.org/uniprotkb/A0A0C5B5G6.txt)
|
||||
|
||||
| OpenAIRE Result field path | FASTA record field xpath | Notes |
|
||||
|------------------------------|--------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
|
||||
| **BIOEntity Mapping** | | |
|
||||
| `id` | `LINE Starts with AC` | id in the form `uniprot_____::md5(id)` |
|
||||
| `pid` | `LINE Starts with AC` | example `AC A0A0C5B5G6;` classid=classname=`uniprot` the vaue is the text after `AC` |
|
||||
| `publicationdate` | `LINE START WITH DT containg text integrated into UniProtKB/Swiss-Prot` | clean and normalize the format of the date to be `YYYY-mm-dd` |
|
||||
| `maintitle` | `LINE START WITH GN` | main title |
|
||||
| **Instance Mapping** | | |
|
||||
| `instance.type` | | `Bioentity` |
|
||||
| `type` | | `Dataset` |
|
||||
| `instance.pid` | `LINE Starts with AC` | `classid = classname = uniprot` |
|
||||
| `instance.url` | `pid` | prepend to the value `https://www.uniprot.org/uniprot/` |
|
||||
| `instance.publicationdate` | `LINE START WITH DT containg text integrated into UniProtKB/Swiss-Prot` | clean and normalize the format of the date to be YYYY-mm-dd |
|
||||
|
||||
|
||||
### Relation Mapping
|
||||
| OpenAIRE Relation Semantic and inverse | Source/Target type | Notes |
|
||||
|----------------------------------------|----------------------|--------------------------------------------------------------------------------------------------------------------------|
|
||||
| `IsRelatedTo` | `LINE START WITH RX` | the mapping creates relationships between the BioEntity and the PubMed or DOI generating an unresolved target identifier |
|
|
@ -128,6 +128,11 @@
|
|||
"type": "doc",
|
||||
"id": "data-provision/aggregation/non-compatible-sources/ebi",
|
||||
"label": "EMBL-EBI"
|
||||
},
|
||||
{
|
||||
"type": "doc",
|
||||
"id": "data-provision/aggregation/non-compatible-sources/uniprot",
|
||||
"label": "UniProtKB/Swiss-Prot"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
Loading…
Reference in New Issue