aggregation section #2

						
				@ -0,0 +13,4 @@

				* OpenAIRE IDs depend on persistent IDs when they are provided by the authority responsible to create them;

				* PIDs are included in the graph according to a tight criterion: the PID Types declared in the table below are considered to be mapped as PIDs only when they are collected from the relative PID authority data source.

				| *PID Type* | *Authority*                                                                                         |

schatz commented

2022-11-08 12:18:48 +01:00

I would remove italics from the header of this table. Note that headers are already styled in bold face.

claudio.atzori commented

2022-11-08 16:55:29 +01:00

Thanks, it is indeed useless to further boldify them. I am going to remove the extras.

docs/data-model/pids-and-identifiers.md Outdated

						
				@ -0,0 +31,4 @@

				This "selection" can be performed when the entities in the graph sharing the same identifier are grouped together. The list of the delegated authorities currently includes

				| *Datasource delegated*               | *Datasource delegating*          | *Pid Type* |

schatz commented

2022-11-08 12:19:16 +01:00

Here as well, I would remove italics.

claudio.atzori commented

2022-11-08 16:55:35 +01:00

Thanks, it is indeed useless to further boldify them. I am going to remove the extras.

docs/data-provision/aggregation/aggregation.md Outdated

						
				@ -0,0 +10,4 @@

				OpenAIRE aggregates metadata records describing objects of the research life-cycle from content providers compliant to the [OpenAIRE guidelines](https://guidelines.openaire.eu/) and from entity registries (i.e. data sources offering authoritative lists of entities, like [OpenDOAR](https://v2.sherpa.ac.uk/opendoar/), [re3data](https://www.re3data.org/), [DOAJ](https://doaj.org/), and various funder databases). After collection, metadata are transformed according to the OpenAIRE internal metadata model, which is used to generate the final OpenAIRE Research Graph, accessible from the [OpenAIRE EXPLORE portal](https://explore.openaire.eu) and the [APIs](https://graph.openaire.eu/develop/).

				The transformation process includes the application of cleaning functions whose goal is to ensure that values are harmonised according to a common format (e.g. dates as YYYY-MM-dd) and, whenever applicable, to a common controlled vocabulary. The controlled vocabularies used for cleansing are accessible at http://api.openaire.eu/vocabularies. Each vocabulary features a set of controlled terms, each with one code, one label, and a set of synonyms. If a synonym is found as field value, the value is updated with the corresponding term.

schatz commented

2022-11-08 12:30:19 +01:00

The link "http://api.openaire.eu/vocabularies" here is broken

claudio.atzori commented

2022-11-08 16:55:45 +01:00

Fixed.

docs/data-provision/aggregation/aggregation.md Outdated

						
				@ -0,0 +17,4 @@

				    <img loading="lazy" alt="Aggregation" src="/img/docs/aggregation.png" width="65%" className="img_node_modules-@docusaurus-theme-classic-lib-theme-MDXComponents-Img-styles-module"/>

				</p>

				The OpenAIRE aggregation system collects information about objects of the research life-cycle compliant to the [OpenAIRE acquisition policy](https://www.openaire.eu/content-aquisition-policy1) from [different types of data sources](https://explore.openaire.eu/search/find/dataproviders):

schatz commented

2022-11-08 12:30:57 +01:00

The link to "OpenAIRE acquisition policy" is broken.

claudio.atzori commented

2022-11-08 16:56:35 +01:00

When I put the link I then informed the person responsible to maintain those pages on the openaire website and then the url was changed. It is fixed now.

docs/data-provision/aggregation/aggregation.md Outdated

						
				@ -0,0 +26,4 @@

				5. Metadata of open source research software from software repositories and SoftwareHeritge

				6. Metadata about other types of research products, like workflow, protocols, methods, research packages

				Relationships between objects are collected from the data sources, but also automatically detected by [inference algorithms](https://www.openaire.eu/blogs/text-mining-services-in-openaire-1) and added by authenticated users, who can insert links between literature, datasets, software and projects via [the “Link” procedure available from the OpenAIRE explore portal](https://explore.openaire.eu/participate/claim).

schatz commented

2022-11-08 12:41:04 +01:00

The second link here required authentication, is it ok ?

claudio.atzori commented

2022-11-08 17:01:39 +01:00

Well, it is explicitly mentioned that the functionality is available for authenticated users. However I agree it is not nice to expose a link that brings the users to a login form. I added a second link to the claiming guide.

docs/data-provision/aggregation/datacite.md

						
				@ -0,0 +53,4 @@

				| `author.pid.value`                                     | `\attributes\creators\nameIdentifiers/nameIdentifier`                                                                                           | the pid value                                                                                                                                                                                                                                        |

				| `maintitle`                                            | `\attributes\titles`                                                                                                                            | Titles whose title type is null or title type is Main                                                                                                                                                                                                |

				| `subtitle`                                             | `\attributes\titles`                                                                                                                            | Titles whose title type is Subtitle since the title type vocabulary in OpenAIRE  use the datacite title type vocabulary                                                                                                                              |

				| **date section**                                       |                                                                                                                                                 | for each date in particular for DOI starting with _10.14457_ we Apply a fix thai date convert a date to ThaiBuddhistDate and reformat to local one see ticket [#6791](https://support.openaire.eu/issues/6791)                                       |

schatz commented

2022-11-08 12:56:30 +01:00

Why is this bold ? is it correct ?

claudio.atzori commented

2022-11-08 17:02:10 +01:00

It is a way to group "sections" of the mapping related to common aspects together.

docs/data-provision/aggregation/datacite.md Outdated

						
				@ -0,0 +76,4 @@

				| `IsHostedBy`                           | `\attributes\relationships\client\id` | `Result/DataSource`  | we defined a curated map clientId/Datasource if we found a match we create an _hostedBy Relation_ |

				### Relation Resolution

schatz commented

2022-11-08 12:56:54 +01:00

This section is empty. Remove this or add content.

claudio.atzori commented

2022-11-08 17:02:32 +01:00

Removed.

docs/data-provision/aggregation/doiboost.md

						
				@ -0,0 +6,4 @@

				The idea behind DOIBoost and its origin can be found in the paper (and related resources) at:

				* La Bruzzo S., Manghi P., Mannocci A. (2019) OpenAIRE's DOIBoost - Boosting CrossRef for Research. In: Manghi P., Candela L., Silvello G. (eds) Digital Libraries: Supporting Open Science. IRCDL 2019. Communications in Computer and Information Science, vol 988. Springer, doi:10.1007/978-3-030-11226-4_11 . Open Access version available at: [10.5281/zenodo.1441071](https://doi.org/10.5281/zenodo.1441071)

schatz commented

2022-11-08 12:42:47 +01:00

I would move the reference to a "References" section at the end of the page, like in the aggregation page.

claudio.atzori commented

2022-11-09 09:47:52 +01:00

Done.

docs/data-provision/aggregation/doiboost.md Outdated

						
				@ -0,0 +29,4 @@

				The construction of the DOIBoost dataset consists of the following phases:

				## 1. Crossref filtering

schatz commented

2022-11-08 12:46:27 +01:00

I would remove the numbering of the titles, as in other pages, there are without numbers.
And I am not sure if these sections need to be under the section "Inputs", so they should be moved one level down in the hierarchy of the titles.

I would remove the numbering of the titles, as in other pages, there are without numbers. And I am not sure if these sections need to be under the section "Inputs", so they should be moved one level down in the hierarchy of the titles.

claudio.atzori commented

2022-11-08 17:03:14 +01:00

Those subsections describe the processing steps needed to build DOIBoost, I reorganised the hierarchy.

docs/data-provision/aggregation/doiboost.md

						
				@ -0,0 +34,4 @@

				Records in Crossref are ruled out according to the following criteria

				* have blank title, examples:

				    * `10.1093/rheumatology/41.7.837`

schatz commented

2022-11-08 12:49:45 +01:00

Do we want examples here or it is "too much" ?

claudio.atzori commented

2022-11-08 17:04:55 +01:00

I'm not sure. Many people that based they work on Crossref contents usually doesn't mention such cases and I think it would be good, for transparency, to mention them.

docs/data-provision/aggregation/ebi.md

						
				@ -0,0 +10,4 @@

				Example:

				```commandline

schatz commented

2022-11-08 12:58:47 +01:00

I am not sure if the full response from EMBL is required here.

docs/data-provision/aggregation/ebi.md Outdated

						
				@ -0,0 +404,4 @@

				The table below describes the mapping from the EBI links records to the OpenAIRE Graph dump format.

				| *OpenAIRE Result field path*   | PubMed record field xpath      | Notes                                                                                                                                                         |

schatz commented

2022-11-08 12:57:54 +01:00

This is empty. Remove it or add content. Also remove italics from the table header.

claudio.atzori commented

2022-11-08 17:05:07 +01:00

Fixed

docs/data-provision/aggregation/pubmed.md

						
				@ -0,0 +8,4 @@

				It contains XML records compliant with the schema available at https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html.

				## Incremental harvesting

				Pubmed exposes an entry point FTP with all the updates for each one. [ftp baseline update](https://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/). We collect the new file and generate the new dataset by upserting the existing item.

schatz commented

2022-11-08 12:52:52 +01:00

Remove the fullstop before the link ?

claudio.atzori commented

2022-11-09 09:48:07 +01:00

Updated.

docs/data-provision/aggregation/pubmed.md Outdated

						
				@ -0,0 +15,4 @@

				The table below describes the mapping from the XML baseline records to the OpenAIRE Graph dump format.

				| *OpenAIRE Result field path*   | PubMed record field xpath      | Notes                                                                                                                                                         |

schatz commented

2022-11-08 12:53:18 +01:00

Remove italics from the table header.

claudio.atzori commented

2022-11-08 17:05:42 +01:00

Removed.

docs/data-provision/aggregation/pubmed.md Outdated

						
				@ -0,0 +18,4 @@

				| *OpenAIRE Result field path*   | PubMed record field xpath      | Notes                                                                                                                                                         |

				|--------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|

				| **Publication Mapping**        |                                |                                                                                                                                                               |

				| `id`                           | ??                             | id in the form `pmid_________::md5(pmid)`                                                                                                                     |

schatz commented

2022-11-08 12:52:05 +01:00

??

claudio.atzori commented

2022-11-08 17:09:42 +01:00

Filled

sidebars.js Outdated

						
				@ -62,0 +64,4 @@

				          label: "Aggregation",

				          link: {type: 'doc', id: 'data-provision/aggregation/aggregation'},

				          items: [

				            { type: 'doc', id: 'data-provision/aggregation/doiboost' },

schatz commented

2022-11-08 12:44:56 +01:00

Is it ok to use only "DOIBoost" here as the title of the item in the sidebar ? If yes, we add a "label" here.

claudio.atzori commented

2022-11-08 17:18:17 +01:00

Thanks for the hint. I'm not sure what would be better for the end user reading this doc. On one end DOIBoost means nothing, hence I'm tempted to leave the longer title (listing the different providers), on the other hand, aestetically speaking I surely prefer the short version.

It's good to know anyway that can build a cleaner TOC.

Thanks for the hint. I'm not sure what would be better for the end user reading this doc. On one end DOIBoost means nothing, hence I'm tempted to leave the longer title (listing the different providers), on the other hand, aestetically speaking I surely prefer the short version. It's good to know anyway that can build a cleaner TOC.

sandro.labruzzo added 2 commits 2022-11-08 15:42:14 +01:00

268bb23545 minor fix

f05888e637 merged commit

sandro.labruzzo added 1 commit 2022-11-08 15:58:28 +01:00

b007a67a3c added EBI mapping

claudio.atzori added 2 commits 2022-11-08 17:05:54 +01:00

e9296f1a40 fixed typos and tables

12263fca62 addressing comments from the code review

claudio.atzori added 1 commit 2022-11-08 17:12:31 +01:00

9524d0d024 addressing comments from the code review

claudio.atzori added 1 commit 2022-11-08 17:16:13 +01:00

dc04f19da1 added short names for some of the aggregation sub-sections