CLiREN-LMS
Data Cleaning and Preparation in R

The Purpose of Data Cleaning and Preparation

A Defensible Cleaning Workflow Should Answer

30-45 minutes Applied Step 17 of 23
Accordion

A Defensible Cleaning Workflow Should Answer

17 / 23
Accordion

A Defensible Cleaning Workflow Should Answer

What was the raw input?
Record the export date, source system, file name, and storage location. Raw data should not be overwritten.
What rules were applied?
Cleaning checks and derivations should be written in scripts or documented specifications rather than reconstructed from memory.
Which records were flagged?
Query outputs should identify records requiring review and should include enough information for follow-up without exposing unnecessary identifiers.
Which outputs were produced?
Outputs may include quality summaries, query listings, cleaned datasets, analysis datasets, logs, and reports. Each output should have a purpose.