a. Raw data preparation

Properly preparing your raw data for analysis will result in a transparent analysis and interpretation process and reproducible results. In addition, it will make your data, intermediate results and end results suited for archiving and sharing.

Prepare your research data for analysis by following these steps:

  1. Create a data dictionary (i.e., metadata).
  2. Create a working copy of the dataset and securely archive the raw data.
  3. Clean the data in the working file and document all cleaning steps in a separate (syntax) file that is archived.
  4. Create an analysis file and preserve the cleaned dataset for archiving purposes.
  5. Preserve your raw and (if needed) intermediate datasets.

Frequently Asked Questions

When your data cannot be traced back to individuals (i.e., anonymised data), it is possible to use any decent statistical package as the management tool for your data. However, you should make sure that the entire process is well-documented and that all data manipulations are documented in libraries of syntax files.

It is advised to store the raw data and all versions after meaningful processing steps that you cannot easily repeat. At least store the raw data that you use as the basis for your publications, including the descriptions of how you obtained these data and how you processed them (i.e., the metadata). You can consider deleting intermediate files to save storage space and to reduce the risk of inadvertent privacy violations. They can also be excluded from a backup scheme to save time on a possible restore after hardware failure. However, it may be useful to keep intermediate data for trace-back reasons. See Basic Documentation.