CLiREN-LMS
Data Analysis in R

Data Analysis in R: Summary and Assessment

Practical Exercise: Creating Descriptive Summaries for a Clinical Dataset

30-60 minutes Applied Step 5 of 7
Exercise

Practical Exercise: Creating Descriptive Summaries for a Clinical Dataset

5 / 7
Exercise

Practical Exercise: Creating Descriptive Summaries for a Clinical Dataset

### Scenario You are preparing a weekly descriptive report for a multisite clinical cohort study. The study team wants a concise set of tables showing enrollment by site, participant characteristics, day 28 outcome completeness, query burden, and selected baseline summaries. The principal investigator has emphasized that all tables must include clear denominators and that missing outcomes should be visible. ### Exercise Tasks 1. Open the existing R project used for the previous cleaning exercise. 2. Import the prepared participant-level dataset from `data_clean`. 3. Confirm the number of rows and unique participants. 4. Summarize enrollment by site using counts and percentages. 5. Summarize sex and treatment arm using counts and percentages. 6. Summarize age using median, interquartile range, minimum, maximum, and missing count. 7. Summarize day 28 outcome status, displaying missing outcomes explicitly. 8. Create a site-by-day-28-status cross-tabulation. 9. Create a data quality table showing missing consent dates and overdue missing day 28 outcomes by site. 10. Export all summary outputs to the `outputs` folder. 11. Write a short interpretation note identifying any outputs that require follow-up. ### Suggested Script ### Reflection Questions 1. Which tables are primarily for data management, and which are closer to analysis reporting? 2. Which summaries use all participants as the denominator? 3. Which summaries would change if only participants with due day 28 outcomes were included? 4. Which outputs would you review before sending a report to the principal investigator? 5. What additional footnotes would help a reader interpret the tables? 6. How would the script need to change if the source dataset had one row per visit rather than one row per participant?