CLiREN-LMS
Data Cleaning and Preparation in R

The Purpose of Data Cleaning and Preparation

Key Data States and Outputs

30-45 minutes Applied Step 12 of 23
Table

Key Data States and Outputs

12 / 23
Table

Key Data States and Outputs

TermMeaningExampleGovernance implication
Raw dataData exported or received from the source system without manual alterationREDCap CSV export saved in data_rawPreserve as received and protect from accidental editing
Cleaned dataDataset after documented checks, corrections, and transformationsDataset regenerated after database queries are resolvedMust be traceable to source data and cleaning decisions
Analysis-ready dataDataset structured for statistical analysisOne row per participant with derived endpoint variablesRequires statistician and protocol alignment
Derived variableVariable calculated from one or more source variablesAge at enrollment or visit window flagRule must be documented and reproducible
Query outputListing of records requiring reviewMissing consent dates or impossible date sequencesShould feed into the approved query workflow
Cleaning logRecord of checks, findings, decisions, and outputsRule ID, affected records, action taken, statusSupports transparency and review