1.1
1.1.1
Which research project will provide the data?
1.2
1.2.1
What are the goals of the project?
1.3
1.3.1
Who is the responsible project leader?
1.4
1.4.1
What is the name of the research institute or university?
1.5
1.5.1
Who are the contacts in this project?
1.6
1.6.1
Do related guidelines from the institute apply?
1
Administrative information for the project
2.1
2.1.1
Short description of the research data that will be generated:
Which data types shall be created and how? What is the underlying methodology or process? What metrics shall be used? Etc.
2.2
2.2.1
Please list the data formats to be used that are not included in the CLARIN-D handbook here:
Which data types and formats are supported
2.3
2.3.1
2.4
2.4.1
Which data from other sources are to be used?
2.5
2.5.1
What license are they under?
2.6
2.6.1
Are these data available for use in scientific research?
2.7
2.7.1
May works derived from these data be created?
2.8
2.8.1
Relationship between the stock data and new data:
Please state the relationship between the stock data and the data that will be generated in the project.
2
Description of the Primary Research Data
3.1
3.1.1
What information is necessary so that your research data is readable and interpretable in the future?
Please describe how you document the primary research data. Here, the data should be described as specifically as possible in plain text. For measured data, please give for example the definitions of the variables; for annotated corpora, please include a glossary of all annotations as well as the annotation hand book.
Ex: The eye tracking data were captured using the model X from company Y with the followng parameter settings. Participants in the experiment were presented with the following stimuli with the following instructions...
3.2
3.2.1
Describe your data
After the plain text description, please use a metadata format which is as standard as possible to describe your data. For this purpose, please state the metadata format (the “profile”) used for each data type. In principle, the Component Metadata Infrastructure (CMDI) is used to describe the metadata. If necessary, existing CMDI profiles can be adapted according to project needs.
Ex: The text corpus to be created is described with the help of the CMDI profile 'TextCorpusProfile' (clarin.eu:cr1:p_1290431694580). Adjustments to the profile are not necessary/However, adujustments will be necessary.
3
Documentation and Metadata
4.1
4.1.1
You must clarify whether and how the data will be anonymized and whether the data need special protection on the grounds of personality rights.
If test subjects are involved in the project, you must clarify whether and how the data will be anonymized and whether the data need special protection on the grounds of personality rights.
In general, CLARIN centers only accept anonymized data or data, where the test subject has clearly stated his/her approval of their publication for use in scientific research. In the latter case, the process for obtaining permission for using the data and documenting the permission agreement must be described.
4.2
4.2.1
Description of the copyright and usage rights of the generated data:
Please use the CLARIN license category calculator to determine the license category and copy the resulting license code into the text field below.
The data were not created under the rights of a third party, and therefore the project has every right to distribute licenses. The following license category was determined with the help of the CLARIN License wizard…
4
Ethics and Privacy
5.1
5.1.1
Description of data storage, retention, and backups
How will the data be stored during the project and how will the data be secured (backed up)? Please describe how often safety backups will be made and to what extent you can make use of university infrastructure to make regular, automatic backups.
5.2
5.2.1
Description of the access mechanism
How do you ensure that only authorized people obtain access to the project data during the project? If you are working with personal data, please describe appropriate measures for their protection here.
5
Data Storage During the Project
6.1.1
6.1.1.1
Quality assurance is important for the long term storage of the data.
For example, it is important to ensure the readability of OCR data, the interpretability of scans, the validity of program code, etc. The strategies used for quality assurance that the project will follow and who is responsible for quality assurance must be clearly described in the data management plan.
In the area of quality assurance, CLARIN-D can only help with the data formats used. All other aspects of quality assurance are the responsibility of the project provider.
6.1
Quality Assurance
6.2.1
6.2.1.1
The provider must clearly discuss the question of granularity in the data management plan. It must be clearly stated what the digital objects in the raw data are and what their granularity is.
For data organization, the granularity of the data to be stored is a complex problem. It is important to be able to split the data into meaningful digital objects that must later be consolidated and searchable. What a meaningful digital object is, however, depends on the context of the data.
For example, a single poem or article can be a meaningful digital object for a literary analysis. On the other hand, many years’ worth of newspaper articles are gathered in linguistic corpora. In this example, the whole corpus is a meaningful digital unit, but not its individual components.
6.2.2
6.2.2.1
What is the number of digital objects?
6.2.3
6.2.3.1
What is their memory size?
6.2
Data Organization
6.1
6.1.1
Data Selection
Please specify which data must be retained and which must be deleted in order to satisfy contractual or legal obligations. All data essential to the reproducibility of research results should be retained. Data which could be relevant to future studies should also be retained. Please give a short description of the criteria for the data selection.
6
Data Selection
7.1
7.1.1
Availability of Your Data
How can potential users find your data? To which users do you plan to give access to your data and under what conditions? Will the data only be available through the repository, or will it also be available directly? Does your data have persistent identifiers by which they can be referenced (see Data Organization)?
7.2
7.2.1
The data provider should describe the intended target group which would reuse the project data. Normally, CLARIN centers are active in the humanities and social sciences, so possible target groups could be linguists and archaeologists.
We suggest that the data can be used by humanities scholars, especially germanists and language teachers.
7
Data Sharing
8.1
8.1.1
When will the metadata be created?
For long term and secure data storage in the interest of good scientific practice, the dependability of repositories is essential. The CLARIN-D centers are externally evaluated and certified to meet the criteria of the Data Seal of Approval (DSA) and of the CLARIN Centre Assessment. Their repositories are thus guaranteed to meet the needs of data storage for the research data over normal time periods.
Generally, research data should be kept for a period of at least ten years. Please fill in the following timetable as much as possible. If reusability of the data is not possible, please leave the corresponding field empty and correct the corresponding section in the document generated at the end of this process.
8.2
8.2.1
When will the data be transferred to a repository?
8.3
8.3.1
The data will be provided starting:
8.4
The data will be made available to the scientific community for use starting:
8.5
8.5.1
The data will be made publically available for use starting:
8.6
8.6.1
The data will be provided at least until:
8.7
8.7.1
An obligatory deletion of the data will occur by:
8
Time Frame
9.1
9.1.1
How high is the cost for metadata generation?
9.2
9.2.1
Funds and time allocated by the Clarin-D center?
9.3
9.3.1
Total budget:
9
Resource Identification