File Management during the Research Process

On this page, you will find general recommendations and assistance for the management of files and data with the focus on data organisation and structuring, hints regarding file naming, versioning and selecting suitable file formats as well as information about storage and backup of these files.

  • Data Organisation and Structuring
  • File Naming
  • Versioning
  • File Formats and Software
  • Data Storage and Backup

Data Organisation and Structuring

For optimal work with research data, materials and documents, it is recommended to systematically organise and structure the generated and used files as well as the directories by certain rules. Organising the data does not only facilitate orientation and increase findability in your own file system in the daily work processes, but also creates transparency and shows positive effects especially when accessed by different members of a research team.

Directory structure and storage location

Directories should be clearly arranged and named in order to facilitate the search for content and to guide people accessing them. Sometimes formulated guidelines and framework conditions of the own institution already exist that need to be taken into consideration, e.g. regarding the storage of personal and ethically sensitive data.

According to best practice, the deposit and organisation of data files is performed in such a way that they can be easily accessed and found by all participants in the research process at any time and also in the long term, furthermore in such a way that changes and file versions can be traced, and that relevant files are preserved and protected against loss.

The following aspects should be considered when choosing a storage location:

  • guaranteed backup procedures for data security reason,
  • provision of sufficient storage capacity,
  • clear access control according to legal requirements, i.e. access for authorised persons only.

At best, the directories should be mapped in a basic hierarchical structure using subdirectories. It is recommended to implement upper and lower categories. Content, form, naming and structure have to be determined by the project and its researchers, of course.

Possible categories of order:

⇒ structure of the project
⇒ work packages
⇒ content of the folder (e.g. bibliography, events, publications, IT, documentation, research data)
⇒ main topics
⇒ temporal information (by date or time period)
⇒ contributors / persons

Within a folder system, usability is systematically supported by repeating and anchoring fixed elements, e.g. a folder "Communication" for internal and external agreements and e-mails, a sub-folder "Archive" for obsolete files that nevertheless need to be kept, a sub-folder "Posted" for files shared with third parties, etc.

File Naming

When creating and editing files, clear naming as well as labelling of changes are elementary. Clear designations of content, date and persons make it easier to distinguish, find and share data without opening a file. They also serve to provide a clear overview of represented content and bring advantage when the data and files have to be selected, sorted and submitted for archiving and publication.

Recommendations

The concerned researchers should agree on clear rules for file naming. The naming should follow these rules. At least, it should be uniform and unique.
Further guidance:

  • In consideration of different technical systems and components (e.g. operating systems, web servers), space and special characters should be avoided. The separation of file name components should be consistent, e.g. separated by hyphen or underscore.
  • The file name should be kept as short as possible but still as long as necessary. This is quite important due to backup and security issues because some technical systems set an upper limit of 260 characters for the entire file path.
  • The content of the file should be reflected in the file name in order to make it easiert to assign the file to the appropriate directories.

Selection of potential elements for creating a file name

  • File content(s), at best using one single term
  • Short title (projekt, reference, etc.)
  • Organisational unit
  • Formal information (original version, draft paper, report, etc.)
  • Date created, modified, published (recommendation: YYYYMMDD)
  • Information about the processing and status (edit, corr, publ., ann., adjusted, etc.)
  • Personal abbreviation or initials of the creating, editing or responsible person
  • Version number

Example: ProjectX_Topic_20001129_edit_MM_v2

For the unambiguous file naming of research data, it is recommended to apply certain standard elements like study abbreviation or study ID, the data type (e.g. interview, video) in unambiguous abbreviated form as well as sequential numbers for the individual data types (001, 002, etc., if necessary further distinctions in the case of several files of one data type ⇒ 001a, 001b, etc.) and identifications in the case of different versions (v1, v2).

Example: StudyAcronym_Video_001b_v3

Versioning

For traceability and control of the own work processes and results, it is strongly advisable to document changes of any kind that have been made. This can be implemented by creating new file versions and adding version numbers to the file name to be able to identify the previous and current versions, but also in the file itself or via external documentation in a versioning document.

Modifications and changes should be clearly named, marked and identified with information about the editor, date modified and the kind of modification. Usually, major changes are marked with a new version number (e.g. v1, v2, v3, etc.), while minor changes are marked with an entry on second or third decimal place (e.g. v1.1, v1.2, etc.). This does not affect the file name in essence.

Obsolete versions should be stored in a separate folder within the directory as long as they are relevant. The original file version should be always kept as a matter of principle.

To check and comprehend changes, version control systems can be used such as the open source software Subversion. Besides, other systems and applications like GitHub, GitLab or even the basic wiki technology might be a sufficient solution due to integrated version management. In general, any use of software that records an editing history is advisable.

File Formats and Software

When creating and using files, it is recommended to pay attention to the selection of adequate file formats and to use appropriate software applications. According to open science and the goal of long-term preservation and the potential re-use of research data, non-proprietary or at least widely accepted file formats should be used. Data archives and repositories often only accept certain file formats that are compatible for long-term archiving, in order to reduce any effort involved in migrating data to newer file formats (e.g. due to newer software versions) and in order to foster facilitated access for re-use.

Further information on recommended file formats is provided by the PubData File Format Policy for Research Data of the MIZ.

Data Storage and Backup

The appropriate storage and backup of generated and used research data represent an important requirement for research data management. This explicitly includes the consideration of data security and data integrity aspects, i.e. the compliance with the legal requirements for the protection of personal and sensitive data as well as, generally, preventing the loss and damage of digital data and associated software due to technical or human errors.

Data protection standards and regulations

When data with personal information or reference are involved, it must be ensured that only authorised persons have access to the data. For this purpose, appropriate protective measures must be taken. These include:

  •  the use of secure transmission and access procedures,
  •  password protection or encryption of files or directories, including secure and controlled storage of access data and keys,
  •  secure authorisation and authentification procedures.

Data backup

In the ongoing research process, the storage and backup of research data should be carried out at regular intervals with sufficient backup copies. It is advisable to use the technical infrastructure of your own institution for regular data backups, e.g. secured and access-protected servers and network drives unter central administration of the computer centre. External storage media (hard drives, USB sticks) are an additional option. Overall, the backup should be distributed, i.e. ensured at different locations.

Using the technical systems of the computer centre ensures professional backup procedures as well as the option of availability for authorised persons including fixed access restrictions and rights management. Thus, data security can be guarantueed, i.e. data loss can be avoided by implementing automated and standardised processes. Data storage on external storage media usually lacks this control of data integrity. Besides, file management and consistency in collaborative work is made more difficult when the data are distributed on many independent storage media.

In general, researchers should plan well in advance of the collection of the data what data has to be stored securely where and under what conditions.They should consider the necessary technical infrastructure and other organisational precautions. Additionally, it is also of bigger interest to harmonise the own project and researcher's needs with the security and storage services of the affiliated institution or - if the requirements cannot be met - to find suitable alternatives.

Data storage and backup for external collaborative projects and cooperations:

In projects with external cooperation partners, data is usually exchanged and accessed via overlapping platforms and systems, e.g. cloud solutions, or technical systems of one of the participating institutions. When data sharing is an essential part of the project, then it is urgently necessary to ensure the safety and security of the data and materials provided in the cooperation network, to check the suitability of the systems and to implement regulated mechanisms and procedures to guarantuee the data security and integrity and to facilitate research data management.

Data Storage at Leuphana

To store your research data and documents during the ongoing project, you can use existing network drives in your organisational unit or apply for a new group drive for your project with a standard storage volume of 100 gigabytes, expandable to 400 gigabytes. Both storage services are free of charge. In case of needing a higher storage volume, it can be granted on request but must be compensated with a small financial contribution.

The data on the network drives get regular backups in clearly defined intervals and standardised automatic procedures minimising the risk of data loss and ensuring data integrity. The IT Service will be pleased to support you to install the network drive on your computer. Please contact heiko.reincke@leuphana.de to request a group drive or to receive a storage volume expansion. For more detailed instructions, please also visit the MIZ manuals page regarding data storage and network drives.

Additionally, the Lower Saxony service academicCloud Sync&Share provides a possibility to share data and materials externally for a restricted group of users. The service can be used via client or web browser and is operated in accordance with European and German data protection regulations. A total of 50 gigabytes of storage space per user is provided. The service should primarily be understood in its proper purpose as a professional and secure exchange and synchronisation platform. → more details.

Please visit our consulting page and get in touch with our research data management team to get individual, case-specific information on the organisation, naming, securing and storage of your research data and files.

Thomas Schwager
Universitätsallee 1, CB.132
21335 Lüneburg
Fon +49.4131.677-1175
thomas.schwager@leuphana.de

Martin Bilz
Universitätsallee 1, CB.105
21335 Lüneburg
Fon +49.4131.677-1113
martin.bilz@leuphana.de

Heiko Reincke
Universitätsallee 1, C7.122
21335 Lüneburg
Fon +49.4131.677-1204
heiko.reincke@leuphana.de