Planning operational workflows
As a data steward, you should always be able to describe your complete operational workflow.
Make the distinction between data capture, data analysis, data archiving, and data sharing. You are responsible for answering questions about the origin of your data, data manipulations during all stages of your study, the location where your data is archived, and with whom it is shared under what conditions. In addition, you should describe your data access policies before your start collecting data (see chapter 'Protecting your data').
Figure: Example of an operational workflow chart. This shows which functionality is involved. It shows the typical activities around clinical data including repositories.
Your data capturing system (whether manual data entry from paper, via electronic forms, or sophisticated real-time connections between the primary source data and the study database) should be able to assess and report the logical consistency and the clinical probability of data values.
If you will be handling a large data set, it is important to think ahead about:
- storage capacity;
- when the raw data will become available;
- backups to safeguard against system failure as well as human error (how long can you wait for a single file or the whole data set to be restored?);
- the location where different steps of the data processing will be carried out (does the whole data set need to be transported to another location? How long will that take?);
- access policies (e.g., whether web-based or multi-user access is required);
- protection against unauthorized access (see chapter 'Protecting your data');
- costs (for instance for storage and compute capacity).
Frequently Asked Questions
Your UMC is responsible for providing a general infrastructure which is compliant with current regulations and guidelines (e.g., on privacy and data integrity). Your UMC should have a standard workflow description for researchers.
For projects with a large data volume, you will have to plan the following:
- the arrival of the data;
- the volume of storage needed over time, especially if the storage infrastructure is shared with other people;
- the capacity of the network should be sufficient if the data must be transported from the location where it is measured to the place where computations will be carried out;
- you should reserve infrastructure and time for copying the information in case the data is brought in on hard disks;
- if different parts of the data processing are performed by different partners in a project, this can be done by moving data between different compute centres, or by giving all partners access to the storage facilities in one location;
- for each of the data files, raw, intermediate, and processed, the disposition after the project must also be decided: will it be archived, made available for reuse or deleted? This decision can be made based on the volume of files, and on the possibility and the effort required to reproduce it;
- for all storage and transport of data there are important security and privacy aspects (for these, please refer to chapter 'Protecting your data').
More information about safe data storage is provided in chapter 'Protecting your data' and in chapter 'Preserving your data archiving'.