Data Cleaning and Preparation in R: Summary and Assessment
Practical Exercise: Cleaning and Preparing a REDCap Export in R
Exercise
5 / 7
Practical Exercise: Cleaning and Preparing a REDCap Export in R
Exercise
Practical Exercise: Cleaning and Preparing a REDCap Export in R
### Scenario
You are supporting a multisite adult clinical cohort study. The study uses REDCap for enrollment, admission, discharge, and day 28 follow-up data. The principal investigator wants a weekly cleaning output that identifies missing consent dates, duplicate participant IDs, age values outside the adult eligibility range, enrollment dates before consent dates, discharge dates before admission dates, and day 28 outcomes that are missing after the follow-up due date. The statistician also wants a prepared participant-level dataset with derived age, length of stay, and day 28 follow-up window status.
### Exercise Tasks
1. Create or open an R project for the study.
2. Place the latest REDCap CSV export in `data_raw`.
3. Create a script named `02_clean_and_prepare_data.R`.
4. Load the required packages.
5. Import the raw export and clean column names.
6. Convert date variables to proper date objects.
7. Create derived variables for age, length of stay, and follow-up timing.
8. Generate query listings for missing consent date, duplicate IDs, date inconsistencies, out-of-range age, and overdue missing outcomes.
9. Combine the query listings into one output file.
10. Save the prepared dataset in `data_clean`.
11. Save a small processing summary that records the number of raw records, number of prepared records, and number of query flags generated.
12. Review the outputs and write brief notes on which checks require protocol interpretation.
### Suggested Script
### Reflection Questions
1. Which checks in this script should be reviewed against the protocol before use?
2. Which variables are original variables, and which are derived variables?
3. How would you modify the script if the export used REDCap repeating instruments?
4. Which outputs should be reviewed before queries are sent to sites?
5. How would you document a decision to change the day 28 visit window?
6. What risks would arise if the script overwrote the raw REDCap export?