The Purpose of Data Cleaning and Preparation
A Defensible Cleaning Workflow Should Answer
Accordion
17 / 23
A Defensible Cleaning Workflow Should Answer
Accordion
A Defensible Cleaning Workflow Should Answer
What was the raw input?
Record the export date, source system, file name, and storage location. Raw data should not be overwritten.
What rules were applied?
Cleaning checks and derivations should be written in scripts or documented specifications rather than reconstructed from memory.
Which records were flagged?
Query outputs should identify records requiring review and should include enough information for follow-up without exposing unnecessary identifiers.
Which outputs were produced?
Outputs may include quality summaries, query listings, cleaned datasets, analysis datasets, logs, and reports. Each output should have a purpose.