Data Cleaning and Preparation in R: Summary and Assessment
Knowledge Check
Quiz
4 / 7
Knowledge Check
Quiz
Knowledge Check
Answer these questions to check understanding.
What is the best description of raw data in an R cleaning workflow?
- A. Data received from the source system without manual alteration.
- B. Data after all derived variables have been created.
- C. Data with all missing values removed.
- D. Data copied into a report table.
Show answer
Answer: A
Why should API tokens not be written directly into shared R scripts?
- A. They are credentials that can expose project data if disclosed.
- B. R cannot read tokens from scripts.
- C. Tokens are only used for statistical analysis.
- D. REDCap does not support API tokens.
Show answer
Answer: A
Which folder should normally contain unmodified REDCap exports?
- A. `data_raw`
- B. `outputs`
- C. `scripts`
- D. `documentation`
Show answer
Answer: A
Why should missing data be classified before action is taken?
- A. Because missing values may be not applicable, not yet due, pending, unknown, or true omissions.
- B. Because all missing values should automatically be replaced with zero.
- C. Because missing values are never relevant to clinical research.
- D. Because R cannot detect missing values.
Show answer
Answer: A
What is a good practice when recoding a variable?
- A. Preserve the original variable and create a new recoded variable.
- B. Delete the original variable immediately.
- C. Recode without checking the data dictionary.
- D. Convert unexpected values to missing without review.
Show answer
Answer: A
What does `case_when()` help with?
- A. Creating conditional recoding or derived variables.
- B. Installing R packages.
- C. Exporting API tokens.
- D. Opening RStudio projects.
Show answer
Answer: A
Why should row counts be checked before and after filtering?
- A. To detect and document records removed by the filtering step.
- B. To make the dataset larger.
- C. To replace missing values.
- D. To delete the raw export.
Show answer
Answer: A
In most clinical data management workflows, where should correction of database values occur?
- A. In the source database through the approved query and audit trail process.
- B. In an unofficial spreadsheet only.
- C. In a hidden R object with no documentation.
- D. In the PDF report.
Show answer
Answer: A