CLiREN-LMS
Data Cleaning and Preparation in R

The Purpose of Data Cleaning and Preparation

Why Cleaning and Preparation Are Different

30-45 minutes Applied Step 14 of 23
Accordion

Why Cleaning and Preparation Are Different

14 / 23
Accordion

Why Cleaning and Preparation Are Different

Cleaning identifies and resolves data quality problems
Cleaning focuses on missing values, invalid codes, impossible dates, duplicates, inconsistencies, and other findings that may require review or correction.
Preparation makes data usable for a purpose
Preparation may include selecting variables, standardizing names, converting dates, reshaping repeated measures, joining datasets, creating derived variables, and producing summaries.
Not every transformation is a correction
Recoding 1 and 0 into Yes and No is a preparation step. Changing an incorrect consent date after source review is a correction. The workflow should document which is which.
The source database remains authoritative
When clinical data need correction, the correction should normally occur in the validated source database through the query and audit trail process, then be reflected in a new export.