aggregation section #2
No reviewers
Labels
No Label
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/openaire-graph-docs#2
Loading…
Reference in New Issue
No description provided.
Delete Branch "aggregation"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR introduces the aggregation section. It is currently a work in progress.
WIP: aggregation sectionto aggregation section@ -0,0 +13,4 @@
* OpenAIRE IDs depend on persistent IDs when they are provided by the authority responsible to create them;
* PIDs are included in the graph according to a tight criterion: the PID Types declared in the table below are considered to be mapped as PIDs only when they are collected from the relative PID authority data source.
| *PID Type* | *Authority* |
I would remove italics from the header of this table. Note that headers are already styled in bold face.
Thanks, it is indeed useless to further boldify them. I am going to remove the extras.
@ -0,0 +31,4 @@
This "selection" can be performed when the entities in the graph sharing the same identifier are grouped together. The list of the delegated authorities currently includes
| *Datasource delegated* | *Datasource delegating* | *Pid Type* |
Here as well, I would remove italics.
Thanks, it is indeed useless to further boldify them. I am going to remove the extras.
@ -0,0 +10,4 @@
OpenAIRE aggregates metadata records describing objects of the research life-cycle from content providers compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) and from entity registries (i.e. data sources offering authoritative lists of entities, like [OpenDOAR](https://v2.sherpa.ac.uk/opendoar/), [re3data](https://www.re3data.org/), [DOAJ](https://doaj.org/), and various funder databases). After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Research Graph, accessible from the [OpenAIRE EXPLORE portal](https://explore.openaire.eu) and the [APIs](https://graph.openaire.eu/develop/).
The transformation process includes the application of cleaning functions whose goal is to ensure that values are harmonised according to a common format (e.g. dates as YYYY-MM-dd) and, whenever applicable, to a common controlled vocabulary. The controlled vocabularies used for cleansing are accessible at http://api.openaire.eu/vocabularies. Each vocabulary features a set of controlled terms, each with one code, one label, and a set of synonyms. If a synonym is found as field value, the value is updated with the corresponding term.
The link "http://api.openaire.eu/vocabularies" here is broken
Fixed.
@ -0,0 +17,4 @@
<img loading="lazy" alt="Aggregation" src="/img/docs/aggregation.png" width="65%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>
</p>
The OpenAIRE aggregation system collects information about objects of the research life-cycle compliant to the [OpenAIRE acquisition policy](https://www.openaire.eu/content-aquisition-policy1) from [different types of data sources](https://explore.openaire.eu/search/find/dataproviders):
The link to "OpenAIRE acquisition policy" is broken.
When I put the link I then informed the person responsible to maintain those pages on the openaire website and then the url was changed. It is fixed now.
@ -0,0 +26,4 @@
5. Metadata of open source research software from software repositories and SoftwareHeritge
6. Metadata about other types of research products, like workflow, protocols, methods, research packages
Relationships between objects are collected from the data sources, but also automatically detected by [inference algorithms](https://www.openaire.eu/blogs/text-mining-services-in-openaire-1) and added by authenticated users, who can insert links between literature, datasets, software and projects via [the “Link” procedure available from the OpenAIRE explore portal](https://explore.openaire.eu/participate/claim).
The second link here required authentication, is it ok ?
Well, it is explicitly mentioned that the functionality is available for authenticated users. However I agree it is not nice to expose a link that brings the users to a login form. I added a second link to the claiming guide.
@ -0,0 +53,4 @@
| `author.pid.value` | `\attributes\creators\nameIdentifiers/nameIdentifier` | the pid value |
| `maintitle` | `\attributes\titles` | Titles whose title type is null or title type is Main |
| `subtitle` | `\attributes\titles` | Titles whose title type is Subtitle since the title type vocabulary in OpenAIRE use the datacite title type vocabulary |
| **date section** | | for each date in particular for DOI starting with _10.14457_ we Apply a fix thai date convert a date to ThaiBuddhistDate and reformat to local one see ticket [#6791](https://support.openaire.eu/issues/6791) |
Why is this bold ? is it correct ?
It is a way to group "sections" of the mapping related to common aspects together.
@ -0,0 +76,4 @@
| `IsHostedBy` | `\attributes\relationships\client\id` | `Result/DataSource` | we defined a curated map clientId/Datasource if we found a match we create an _hostedBy Relation_ |
### Relation Resolution
This section is empty. Remove this or add content.
Removed.
@ -0,0 +6,4 @@
The idea behind DOIBoost and its origin can be found in the paper (and related resources) at:
* La Bruzzo S., Manghi P., Mannocci A. (2019) OpenAIRE's DOIBoost - Boosting CrossRef for Research. In: Manghi P., Candela L., Silvello G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, doi:10.1007/978-3-030-11226-4_11 . Open Access version available at: [10.5281/zenodo.1441071](https://doi.org/10.5281/zenodo.1441071)
I would move the reference to a "References" section at the end of the page, like in the aggregation page.
Done.
@ -0,0 +29,4 @@
The construction of the DOIBoost dataset consists of the following phases:
## 1. Crossref filtering
I would remove the numbering of the titles, as in other pages, there are without numbers.
And I am not sure if these sections need to be under the section "Inputs", so they should be moved one level down in the hierarchy of the titles.
Those subsections describe the processing steps needed to build DOIBoost, I reorganised the hierarchy.
@ -0,0 +34,4 @@
Records in Crossref are ruled out according to the following criteria
* have blank title, examples:
* `10.1093/rheumatology/41.7.837`
Do we want examples here or it is "too much" ?
I'm not sure. Many people that based they work on Crossref contents usually doesn't mention such cases and I think it would be good, for transparency, to mention them.
@ -0,0 +10,4 @@
Example:
```commandline
I am not sure if the full response from EMBL is required here.
@ -0,0 +404,4 @@
The table below describes the mapping from the EBI links records to the OpenAIRE Graph dump format.
| *OpenAIRE Result field path* | PubMed record field xpath | Notes |
This is empty. Remove it or add content. Also remove italics from the table header.
Fixed
@ -0,0 +8,4 @@
It contains XML records compliant with the schema available at https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html.
## Incremental harvesting
Pubmed exposes an entry point FTP with all the updates for each one. [ftp baseline update](https://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/). We collect the new file and generate the new dataset by upserting the existing item.
Remove the fullstop before the link ?
Updated.
@ -0,0 +15,4 @@
The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.
| *OpenAIRE Result field path* | PubMed record field xpath | Notes |
Remove italics from the table header.
Removed.
@ -0,0 +18,4 @@
| *OpenAIRE Result field path* | PubMed record field xpath | Notes |
|--------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Publication Mapping** | | |
| `id` | ?? | id in the form `pmid_________::md5(pmid)` |
??
Filled
@ -62,0 +64,4 @@
label: "Aggregation",
link: {type: 'doc', id: 'data-provision/aggregation/aggregation'},
items: [
{ type: 'doc', id: 'data-provision/aggregation/doiboost' },
Is it ok to use only "DOIBoost" here as the title of the item in the sidebar ? If yes, we add a "label" here.
Thanks for the hint. I'm not sure what would be better for the end user reading this doc. On one end DOIBoost means nothing, hence I'm tempted to leave the longer title (listing the different providers), on the other hand, aestetically speaking I surely prefer the short version.
It's good to know anyway that can build a cleaner TOC.