Selection, Preparation and Documentation of Research Data

Research data should always be selected, processed and documented with a view to their reuse and comprehensibility. This means that all relevant data and results should ideally be preserved and fully reproducible. In addition, formal requirements for archiving and publishing the data must be observed and fulfilled, as well as legal regulations and other parameters, such as official guidelines and policies of funding organisations or participating institutions like your own institution.

  • Data selection for Preservation and Publication
  • Preparation of your Research Data
  • Data Documentation

Data selection for Preservation and Publication

The selection of data for preservation and publication should be driven by the demands and needs for reuse and reproducibility and is the responsibility of the concerned researchers. They have to decide which data are relevant for comprehending the research results and which data have to be preserved and to be made available. This decision explicitly includes the selection of the kinds of data and data types.

In summary, to prepare and make the decision, the researchers should estimate and evaluate the data based on the third-party requirements, the own needs and general aspects like verifiability, uniqueness of the data, costs, regulatory requirements and technical preservability.

Guiding Questions

  • What materials, information and data are actually necessary to map your own research process and make the research results reproducible?
  •  What (additional) information could be relevant and useful considering the sharing of the data for reuse by third parties in similar or even other research contexts?

These key questions influence the entire preparation and documentation process and should be kept in mind.

It is not only the research data itself that can be of interest. Other contextual information and materials such as research instruments or information on procedures and methodology in the research process including the preparation and analysis of the data can also be important for the reuse and interpretation of the data. The selection is not limited to primary data generated and collected in the research project. Secondary data and compiled datasets that include external data sources or referenced data are concerned as well if they occur.

The quality of the data plays a key role for reuse and preservation. Does a sufficient description of the data exist by means of metadata and contextual materials? Can sufficient information be provided on the context of origin, processing and analysis of the data?

Requirements for archiving and sharing data for reuse should be considered as well, such as required long-term preservation and archiving periods, other contractually binding reasons, e.g. from official funding agreements or institutional regulations, legal restrictions due to data protection or copyright, or access conditions of the prospective archiving institution. Taking care in good time helps to identify possible conflicts of interest. In order to manage the requirements and potentials, it is advisable to create and maintain a data management plan right at the initiation phase of a data-related research project.

Further link:
⇒ Overview "Five steps to decide what data to keep" (Digital Curation Centre (DCC))

Preparation of your Research Data

Immediately after the collection, the gathered raw data (primary data) mostly are by no means ready for analysis, but still have to be converted into a suitable form for the answering of the own research question and data interpretation. This can mean the digitisation of the data or content, but also compiling existing data (possibly including data from third parties = secondary use) or the generation of new data types and datasets. This formal data preparation is complemented by data cleansing in the later analysis process: Data may have to be replaced or removed, i.e. unsuitable, unauthorised, incorrect and missing data, values and information must be standardised and corrected according to fixed rules. This is also partly necessary due to legal requirements (e.g. through anonymisation or pseudonymisation, compliance with terms of use).

Requirements for the preparation of the data

The preparation and processing of data should be carried out according to systematic research data management, i.e. in such a way that

  • the processing procedures follow determined and fixed rules,
  • the single steps of the data processing and preparation are comprehensible and well documented,
  • reuse and interpretation by third parties is possible,
  • consistency prevails.

This includes clear designations, labels and naming in the dataset itself, checklists and overview documents serving as documentation material as well as detailed information on the procedure and methodology for the processing and the preparation of the data.
In terms of re-usability, it is advisable to identify the content and information needed for archiving and publishing data and to carry out appropriate documentation and quality assurance.

Please visit our page File Management for more information about this topic.

Data Documentation

Documenting the own research results, data and processes achieves considerable added value. Transparent documentation using consistent rules and standards

  • facilitates the readability and interpretability of the research data,
  • enables the traceability and reproducibility of the research process,
  • increases the visibility and the findability of the data via cataloguing and thereby the provision of the documented information and materials in the form of metadata and documents,
  • increases the probability of use by third parties and consequently of scientific citation,
  • meets good scientific practice.

In terms of reusability, reuse and comprehensibility of the obtained data and findings, attention should therefore also be directed to a stringent data and study description in the form of metadata and accompanying materials according to the goal of good scientific practice and quality assurance. Usually, this implies more effort for the researchers, a fact that can certainly be taken into account when applying for funding and is also certainly approved if sufficient substantiation is provided.

Important components in the documentation process

The documentation of research data does not only consist of correctly named and labelled files and values but also of any relevant information about these data, so-called metadata. These metadata should contain information about the study and design, the applied research methods, on the collection of the data, the preparation, processing and editing, as well as on the analysis process, all in terms of the comprehensibility of the research processes and interpretability of the research data.

For a sufficient documentation, accompanying materials with contextual information might be necessary as well, such as method reports, coding, survey instruments, instructions, etc.

Guiding questions in the documentation process

  • Do specifications or guidelines exist from funders, the own institution, the scientific community, archives or repositories regarding the information and details that should be documented, and if so: what form is recommended?
  • What could be of interest to third parties to interpret and use the data?
  • What information needs to be available - for involved and not participated researchers - to be able to reproduce, validate or reuse the data now and for the long term?
  • What information about the data should be available in order to be able to quickly grasp the analysis potential for the own research?

Information and details that appear to be irrelevant to answering your own research question may well be relevant for possible reuse in similar or even other research contexts.

Standardised metadata

Besides the fact that data archives and repositories develop quality controls and workflows for the provision and publication of data and their metadata with the additional enrichment and formal verification of information and materials, they also structure and describe the data and especially metadata through the use of metadata standards and controlled vocabularies, thereby making a decisive contribution to the reuse and findability of research data. Standardised metadata lead to uniform documentation of data and ensure that data can be searched and found internationally via reference systems and catalogues under the same quality of metadata.

Metadata standards contain specified definitions and vocabularies for the description of data. At best, they should be widely used within a scientific community or by data archives or repositories, and they should be internationally compatible in the sense of interoperability. This guarantees uniform documentation and description of the same quality across the board and, consequently, the (automatic) exchange of metadata between catalogue systems. This in turn increases the chances of accessing the information and promotes the overall visibility and reusability of the data.


Our team will be pleased to support you in terms of metadata and documentation including information about the requirements and systems at Leuphana or other data service providers. Please contact us!

Your contact person regarding the fields of data preparation and metadata documentation: Thomas Schwager

Thomas Schwager
Universitätsallee 1, CB.132
21335 Lüneburg
Fon +49.4131.677-1175

Martin Bilz
Universitätsallee 1, CB.105
21335 Lüneburg
Fon +49.4131.677-1113