Entering and monitoring your data

You can protect the scientific integrity of your study by: continually documenting the data entry process (i.e., 'who enters or modifies a particular data element at what location and time'; this is mandatory for clinical trials); implementing a method for validating your data after initial entry.

Frequently Asked Questions

A terminology is a standardised list of terms that are used in a particular domain. The terms are organised by concept. There are different types of terminologies:

  • code system, where concepts are associated with a code;
  • thesaurus, where terms are ordered systematically or alphabetically;
  • classification, where concepts are ordered hierarchically, so each concept is grouped with similar concepts in a particular class;
  • vocabulary (also called ontology), where concepts are defined with hierarchical and non-hierarchical relations;
  • nomenclature, a type of vocabulary that also defines a syntax with which you can express concepts that are not (yet) part of the nomenclature.

Which terminology or standard to use highly depends on the situation. One example is a classification such as ICD-10. We recommend consulting the Nictiz website.

You should preferably store this information within the software that you are using. Many software packages do this automatically in the so-called audit trail.

An audit trail is mandatory for clinical trials. In other cases you can ask your UMC's experts for lists of minimal requirements.

You can protect the quality of your study by implementing one of the following validation methods:

  • involving a second person to check entered data;
  • producing data quality reports;
  • extensive internal consistency logic;
  • double data entry;
  • you can compare your data with the primary source for verification when you are using data from a primary data source like an electronic patient file.

Make sure you perform data quality checks before, during, and after your data collection.

Options are:

  • at the time of data entry, as warnings or error messages;
  • as easily available quality report in a continuous cycle of data evaluation, check, and update (either manually or automatically).

Basic quality check rules are:

  • You should not allow outright impossible values to be entered. (In a statistical package you can achieve this by programming a collection of selection syntaxes that report on cases with unusual or impossible (combinations of) values.)
  • It is preferable to enter 'unlikely' values and flag them as such instead of disallowing entry of such values.
  • Under no circumstances should a data quality verification process lead to record or patient selection decisions.
  • Care should be taken in application of XML specifications that by their design could lead to entire case or record refusals if not properly programmed.
  • Make sure that you document quality checks in your metadata.

If you have a large number of files or very large files, you should keep a master list with critical information. Your master list should be properly versioned, so that all changes are registered over time along with their reason, and so that everyone in the project agrees on what is the latest information.

We recommended storing your raw data and all versions after meaningful processing steps that you cannot easily repeat. At least store the raw data that you use as the basis for your publications, including the descriptions of how you obtained this data and how you processed them (metadata). Ensure that it is clear which metadata describes which data.