b. Selecting file formats

Ensuring that your data is FAIR requires care in selecting file formats. It is important that you consider how your data can be accessed in ten years from now: will software still exist that can read the information?

In the molecular life sciences, it is common to save data in files.  Structured human-readable tabular formats are preferred for files that will be analysed by other researchers. More formal, computer-readable descriptions of the data should complement the files to enable scalable, automated applications.

The UMCs recommend selecting file formats that are:

  • open (i.e., formats that can always be interpreted, so not '.doc' and '.xls' or instrument-specific data formats);
  • well-documented (i.e., rigorous like 'xml' with a schema description is preferred over formats that are open to multiple interpretations like '.csv' without schema descriptions);
  • flexible (i.e., self-describing formats which can adapt to future needs);
  • frequently used (i.e., for which conversion tools will be created and maintained if necessary).

Frequently Asked Questions

Data Archiving and Networking Services (DANS) has created a list of preferred file formats. DANS is confident that these formats will offer the best long-term guarantees in terms of usability, accessibility and sustainability. Therefore, DANS favours the use of these formats and recommends depositors to try to deposit data as much as possible in preferred formats. DANS has also created a list of non-preferred formats, which are widely used in addition to the preferred formats. According to DANS, these will be moderately to reasonably usable, accessible and robust in the long term.

  • FAIRsharing.org is mapping the landscape of community-developed standards in the life sciences.
  • The RD-Alliance has a list of recommendations and outputs.
  • CORBEL is a platform for harmonised user access to biological and medical technologies, biological samples and data services required by cutting-edge biomedical research.