added documentation to Pubmed Class and also added mvn site for dhp-aggregations
parent
3469cc2b1d
commit
48923e46a1
@ -0,0 +1,9 @@
|
||||
##DHP-Aggregation
|
||||
|
||||
This module defines a set of oozie workflows for the **collection** and **transformation** of metadata records.
|
||||
|
||||
Both workflows interact with the Metadata Store Manager (MdSM) to handle the logical transactions required to ensure
|
||||
the consistency of the read/write operations on the data as the MdSM in fact keeps track of the logical-physical mapping
|
||||
of each MDStore.
|
||||
|
||||
It defines [mappings](mappings.md) for transformation of different datasource (See mapping section).
|
@ -0,0 +1,7 @@
|
||||
##DHP-Aggregation
|
||||
|
||||
This module defines a set of oozie workflows for the **collection** and **transformation** of metadata records.
|
||||
|
||||
Both workflows interact with the Metadata Store Manager (MdSM) to handle the logical transactions required to ensure
|
||||
the consistency of the read/write operations on the data as the MdSM in fact keeps track of the logical-physical mapping
|
||||
of each MDStore.
|
@ -0,0 +1,18 @@
|
||||
DHP Aggregation
|
||||
===============
|
||||
|
||||
DHP-Aggregations contains different mappings from original data format into OAF Data Format,
|
||||
which converge in the graph in different ways:
|
||||
|
||||
- Via Action Manager
|
||||
- Direct in the MdStore on Hadoop
|
||||
|
||||
Below the list of the implemented mapping
|
||||
|
||||
|
||||
Mappings
|
||||
=======
|
||||
|
||||
1. [PubMed](pubmed.md)
|
||||
2. [Datacite](datacite.md)
|
||||
|
@ -0,0 +1,62 @@
|
||||
#Pubmed Mapping
|
||||
This section describes the mapping implemented for [MEDLINE/PubMed](https://pubmed.ncbi.nlm.nih.gov/).
|
||||
|
||||
Collection
|
||||
---------
|
||||
The native data is collected from [ftp baseline](https://ftp.ncbi.nlm.nih.gov/pubmed/baseline/) containing XML with
|
||||
the following [shcema](https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html)
|
||||
|
||||
|
||||
Parsing
|
||||
-------
|
||||
The resposible class of parsing is [PMParser](./scaladocs/#eu.dnetlib.dhp.sx.bio.pubmed.PMParser) that generates
|
||||
an intermediate mapping of PubMed Article defined [here](/apidocs/eu/dnetlib/dhp/sx/bio/pubmed/package-summary.html)
|
||||
|
||||
|
||||
Mapping
|
||||
-------
|
||||
|
||||
The table below describes the mapping from the XML Native to the OAF mapping
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
| Xpath Source | Oaf Field | Notes |
|
||||
| ----------- | ----------- | ----------- |
|
||||
| //PMID | pid | classid = classname = pmid
|
||||
| | **Instance Mapping** | |
|
||||
|//PublicationType | InstanceType | If the article contains the typology **Journal Article** then we apply this type else We have to find a terms that match the vocabulary otherwise we discard it
|
||||
|//PMID | instance/PID | Map the pmid also in the pid in the instance |
|
||||
| //ArticleId[./@IdType="doi" | instance/alternateIdentifier |classid = classname = doi
|
||||
|//PMID | instance/URL | prepend to the PMId the base url https://pubmed.ncbi.nlm.nih.gov/
|
||||
| //PubmedPubDate | instance/Dateofacceptance | apply the function GraphCleaningFunctions.cleanDate before assign it
|
||||
| FOR ALL INSTANCE | CollectedFrom | datasourceName: *Europe PubMed Central* DatasourceId:
|
||||
| | **Journal Mapping** | |
|
||||
|//Journal/PubDate| Journal/Conferencedate | map the date of the Journal
|
||||
|//Journal/Title| Journal/Name | |
|
||||
|//Journal/Volume| Journal/Vol | |
|
||||
|//Journal/ISSN| Journal/issPrinted | |
|
||||
|//Journal/Issue| Journal/Iss | |
|
||||
| | **Publication Mapping** | |
|
||||
| //PubmedPubDate | Dateofacceptance | apply the function GraphCleaningFunctions.cleanDate before assign it
|
||||
| //Title | title | with qualifier ModelConstants.MAIN_TITLE_QUALIFIER
|
||||
| //AbstractText | Description ||
|
||||
|//Language| Language| cleaning vocabulary -> dnet:languages
|
||||
|//DescriptorName| Subject | classId, className = keyword
|
||||
| | **Author Mapping** | |
|
||||
|//Author/LastName| author.Surname| |
|
||||
|//Author/ForeName| author.Forename| |
|
||||
|//Author/FullName| author.Forename| Concatenation of forname + lastName if exist |
|
||||
|FOR ALL AUTHOR | author.rank| sequential number starting from 1|
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
Binary file not shown.
After Width: | Height: | Size: 21 KiB |
@ -0,0 +1,32 @@
|
||||
<?xml version="1.0" encoding="ISO-8859-1"?>
|
||||
<project xmlns="http://maven.apache.org/DECORATION/1.8.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/DECORATION/1.8.0 https://maven.apache.org/xsd/decoration-1.8.0.xsd"
|
||||
name="DHP-Aggregation">
|
||||
<skin>
|
||||
<groupId>org.apache.maven.skins</groupId>
|
||||
<artifactId>maven-fluido-skin</artifactId>
|
||||
<version>1.8</version>
|
||||
</skin>
|
||||
<poweredBy>
|
||||
<logo name="OpenAIRE Research Graph" href="https://graph.openaire.eu/"
|
||||
img="https://graph.openaire.eu/assets/common-assets/logo-large-graph.png"/>
|
||||
</poweredBy>
|
||||
<body>
|
||||
<links>
|
||||
<item name="Code" href="https://code-repo.d4science.org/" />
|
||||
</links>
|
||||
<menu name="Documentation">
|
||||
<item name="Mappings" href="mappings.html" collapse="true">
|
||||
<item name="Pubmed" href="pubmed.html"/>
|
||||
<item name="Datacite" href="datacite.html"/>
|
||||
</item>
|
||||
<item name="Release Notes" href="release-notes.html" />
|
||||
<item name="General Information" href="about.html"/>
|
||||
|
||||
<item name="JavaDoc" href="apidocs/" />
|
||||
<item name="ScalaDoc" href="scaladocs/" />
|
||||
|
||||
</menu>
|
||||
<menu ref="reports"/>
|
||||
</body>
|
||||
</project>
|
Loading…
Reference in New Issue