CLiREN-LMS
Data Cleaning and Preparation in R

Data Cleaning and Preparation in R: Summary and Assessment

Knowledge Check

30-60 minutes Applied Step 4 of 7
Quiz

Knowledge Check

4 / 7
Quiz

Knowledge Check

Answer these questions to check understanding.

What is the best description of raw data in an R cleaning workflow?
  • A. Data received from the source system without manual alteration.
  • B. Data after all derived variables have been created.
  • C. Data with all missing values removed.
  • D. Data copied into a report table.
Show answer

Answer: A

Why should API tokens not be written directly into shared R scripts?
  • A. They are credentials that can expose project data if disclosed.
  • B. R cannot read tokens from scripts.
  • C. Tokens are only used for statistical analysis.
  • D. REDCap does not support API tokens.
Show answer

Answer: A

Which folder should normally contain unmodified REDCap exports?
  • A. `data_raw`
  • B. `outputs`
  • C. `scripts`
  • D. `documentation`
Show answer

Answer: A

Why should missing data be classified before action is taken?
  • A. Because missing values may be not applicable, not yet due, pending, unknown, or true omissions.
  • B. Because all missing values should automatically be replaced with zero.
  • C. Because missing values are never relevant to clinical research.
  • D. Because R cannot detect missing values.
Show answer

Answer: A

What is a good practice when recoding a variable?
  • A. Preserve the original variable and create a new recoded variable.
  • B. Delete the original variable immediately.
  • C. Recode without checking the data dictionary.
  • D. Convert unexpected values to missing without review.
Show answer

Answer: A

What does `case_when()` help with?
  • A. Creating conditional recoding or derived variables.
  • B. Installing R packages.
  • C. Exporting API tokens.
  • D. Opening RStudio projects.
Show answer

Answer: A

Why should row counts be checked before and after filtering?
  • A. To detect and document records removed by the filtering step.
  • B. To make the dataset larger.
  • C. To replace missing values.
  • D. To delete the raw export.
Show answer

Answer: A

In most clinical data management workflows, where should correction of database values occur?
  • A. In the source database through the approved query and audit trail process.
  • B. In an unofficial spreadsheet only.
  • C. In a hidden R object with no documentation.
  • D. In the PDF report.
Show answer

Answer: A