Reusing existing data

With the increasing importance of data-intensive science, it is becoming more and more likely that you can use existing data for your own study. Reusing data may be more efficient, because you reduce inconvenience for study subjects and you save resources, animals, and the effort of collecting new data.

Grant reviewers will also take this into account: your chances of getting funded are significantly better if you show that you have considered reusing data. So before you start collecting new data, you should ask yourself whether it is possible to use existing data:

  • to completely answer your research question;
  • to complement or enrich your own data set.

You should also consider re-using metadata from other studies as a template for your data definitions.

Frequently Asked Questions

Before reusing data, you should ask yourself the following questions:

  • Can I use this data to answer my research question?
  • Is the methodology of the original study suited to answer my research question?
  • Can I get access to this data? Are there any restrictions on how I can use the data? (e.g., intellectual property rights or privacy and informed consent restrictions)
  • Do the people that produced this data offer the data on a FAIR basis?
  • Did they follow ethical and integrity guidelines? (When in doubt, ask your local security officer or privacy officer.)
  • Is the data very voluminous? (i.e., copying it all to your own computing resources may be inefficient)
  • Is conversion of data formats or another preparation required?
  • Is the data based on standard ontologies and terminologies?
  • Do I need a (manual) data harmonisation step in order to meet the standards set in my own study?
  • Is the data versioned? What if a new version is released during my project: will I update, integrate and re-do my calculations?
  • Do these partners follow the FAIR principle?
  • What informed consent requirements are involved?
  • Are there issues with intellectual property rights?
  • Are there conflicts of interest?

Sources of reusable data are:

  • reference data;
  • data on reference cohorts;
  • very similar data collected in a different study;
  • data from an earlier smaller but similar study;
  • healthcare systems;
  • biobanks;
  • the biomedical literature;
  • digital repositories;
  • data centres.

Text in preparation