b. Metadata

Metadata is ‘data about data’, i.e., all information that is required to interpret, understand, and (re)use your dataset. You can make your data user-friendly by describing metadata.

Metadata at the level of a research project describes general information, that helps to interpret the various data files. FAIR tools and online data repositories will also ask you to add metadata. It is advisable to add extra files to your file structure that contain the human and machine-readable version of information about your data (the metadata). This makes your data self-describing. Tools such as GitLab provide an alternative infrastructure with a similar function.

Things to document include:

  • the name of the dataset or research project that produced it;
  • names and addresses of the organisation or people who created the data;
  • identification numbers of the dataset, even if it is just an internal project reference number;
  • unambiguous descriptions of all major entities in the study, such as samples, individuals, panels, or genotypes;
  • key dates associated with the data, including project start and end date, data modification dates, release date and time period covered by the data;
  • the origin of all data (i.e., data provenance description; the origin of the data should be verifiable; data lineage);
  • the protocols that were used, including environmental aspects and study setup (e.g., persons, standard operating procedures, conditions, instrument settings, calibration data, data filters and data subset selections). This is all essential for data reuse and data quality verification;
  • An indication of the data quality.

Collecting metadata will help:

  • you, your supervisor and collaborators to understand and interpret your data;
  • others to find, use, properly cite or reproduce your data;
  • ensure the long-lasting usability of your data.

Important considerations for metadata are:

  • collect more than you would need for your own research if it improves the value of your data for later reuse;
  • interoperability;
  • use standardised terminologies.

Frequently Asked Questions

There are many minimal metadata standards, for many different kinds of data. Often, such standards are made for specific kinds of experiments, by a community of experts on that type of experiment. Many minimal metadata standards can be found on FAIRsharing.org. Consider collecting more than the minimum (i.e., more than you need for your own research) if it improves the value of your data for later reuse, including optional fields in the minimal metadata standard). The MIT Libraries' guidelines on documentation and metadata include a useful list of documentation that you should include.

You should use a standardised protocol for data collection for reproducibility and in order to allow others to reuse your data in the future. This ensures that follow-up studies will have a homogeneous dataset. You will probably also need to record parameters that may seem irrelevant to your own study.

For example:

  • geographical area of data collection;
  • instruments used;
  • calibration method used;
  • demographics;
  • time between collecting samples and performing measurements.

Metadata and data should be stored close to each other to make sure that the association between the two is clear. However, this is not always sufficient. Especially when file names are used to couple data and metadata, human errors can dissociate the two. Some data formats allow the metadata to be stored in the same file as the data itself.

Use the Toolbox to find support on metadata collection at your UMC.