The Purpose of Data Cleaning and Preparation
Key Data States and Outputs
Table
12 / 23
Key Data States and Outputs
Table
Key Data States and Outputs
| Term | Meaning | Example | Governance implication |
|---|---|---|---|
| Raw data | Data exported or received from the source system without manual alteration | REDCap CSV export saved in data_raw | Preserve as received and protect from accidental editing |
| Cleaned data | Dataset after documented checks, corrections, and transformations | Dataset regenerated after database queries are resolved | Must be traceable to source data and cleaning decisions |
| Analysis-ready data | Dataset structured for statistical analysis | One row per participant with derived endpoint variables | Requires statistician and protocol alignment |
| Derived variable | Variable calculated from one or more source variables | Age at enrollment or visit window flag | Rule must be documented and reproducible |
| Query output | Listing of records requiring review | Missing consent dates or impossible date sequences | Should feed into the approved query workflow |
| Cleaning log | Record of checks, findings, decisions, and outputs | Rule ID, affected records, action taken, status | Supports transparency and review |