The Purpose of Data Cleaning and Preparation
Table 1
Table
13 / 23
Table 1
Table
Table 1
| Term | Meaning | Example | Governance implication |
|---|---|---|---|
| Raw data | Data exported or received from the source system without manual alteration | REDCap CSV export saved in `data_raw` | Preserve as received and protect from accidental editing |
| Cleaned data | Dataset after documented checks, corrections, and transformations | Dataset with corrected values after database queries are resolved and re-exported | Must be traceable to source data and cleaning decisions |
| Analysis-ready data | Dataset structured for statistical analysis | One row per participant with derived endpoint variables | Requires statistician and protocol alignment |
| Derived variable | Variable calculated from one or more source variables | Age at enrollment, length of stay, visit window flag | Rule must be documented and reproducible |
| Query output | Listing of records requiring site or investigator review | Missing consent dates or impossible date sequences | Should feed into approved query workflow |
| Cleaning log | Record of checks, findings, decisions, and outputs | Table of rule ID, affected records, action taken, status | Supports transparency and review |