Capturing metadata

To make your data user-friendly, you need to collect metadata (i.e., data about your data).

Things to document include:

  • the name of the data set or research project that produced it;
  • names and addresses of the organisation or people who created the data;
  • identification numbers of the data set, even if it is just an internal project reference number;
  • key dates associated with the data, including project start and end date, data modification dates, release date, and time period covered by the data;
  • the origin of all data (i.e., data provenance description; the origin of the data should be verifiable);
  • the protocols that were used including environmental aspects and study setup (e.g., persons, (standard operating) procedures, conditions, instrument settings, calibration data, data filters and data subset selections). This is all essential for data reuse and data quality verification.

Collecting metadata will help:

  • you, your supervisor and collaborators to understand and interpret your data;
  • others to find, use, properly cite, or reproduce your data;
  • ensure the long-lasting usability of your data.

Important considerations for meta-data are:

  • collect more than you would need for your own research if it improves the value of your data for later reuse;
  • interoperability;
  • use standardised terminologies.

Frequently Asked Questions

Metadata is data about data (i.e., all the information that is necessary to interpret, understand, and use your data set. Collecting metadata will enable you and others to fully understand your raw, intermediate, and result data. You require unambiguous descriptions of all major entities in your study (e.g., samples, individuals, panels, or genotypes). You can find more information about this topic on the website of the Australian National Data Service.

There are many minimal metadata standards, for many different kinds of data. Consider collecting more than the minimum (i.e., more than you need for your own research if it improves the value of your data for later reuse). The MIT Libraries' guidelines on documentation and metadata include a useful list of documentation that you should include.

  • embedded documentation;
  • supporting documentation;
  • catalogue metadata.

To make sure that the association between metadata and data is always clear, the two should be stored close to each other. However, this is not always sufficient. Especially when file names are used to couple data and metadata, human errors can dissociate the two.