Data Cleaning and Preparation in R

Cleaning Logs and Query Outputs

30-45 minutes Applied Step 5 of 9

Accordion

Cleaning Logs and Query Outputs

5 / 9

Accordion

Cleaning Logs and Query Outputs

Part 1

A cleaning log is a structured record of data issues, checks, decisions, and status. It may be maintained in the EDC system, a tracking database, a spreadsheet, or generated from R. The exact format depends on the study, but the purpose is consistent: to show what was found, what was done, and what remains unresolved. R can generate query outputs from rule-based checks. For example, a script can identify missing consent dates, impossible age values, and date inconsistencies, then combine them into one query listing: The `bind_rows()` function combines query outputs with the same column structure. The resulting listing can be reviewed by the data manager before being entered into REDCap or sent through the approved query process. It is important that R-generated queries be reviewed. Automated rules may flag legitimate exceptions, and some rules may need refinement.

Part 2

A cleaning log may include additional fields: R can create the initial issue listing, but status and resolution notes often require workflow integration with the EDC system or a query management tracker. If queries are managed inside REDCap, the R output should support that process rather than replace it. If queries are tracked externally, the tracker should be controlled, versioned, and backed up. The following code adds a date identified field:

Part 3

This creates an operational output. The study team should define how this output is reviewed, who approves queries, how site responses are recorded, and when queries are considered closed. Cleaning logs should not become informal side records that contradict the source database or approved query system.