Cross-Tabulations and Proportions
Cross-Tabulations and Proportions
Accordion
5 / 12
Cross-Tabulations and Proportions
Accordion
Cross-Tabulations and Proportions
Part 1
Cross-tabulation summarizes the relationship between two categorical variables. In clinical research data management, cross-tabulations are useful for comparing outcome status by site, visit completion by visit, adverse event severity by relatedness, query status by site, or treatment arm by sex. A cross-tabulation is not automatically a statistical test; it is first a structured descriptive table.
The simplest cross-tabulation uses `tabyl()`:
This produces counts of day 28 status within each site. To add row percentages:
Part 2
Row percentages answer the question: within each site, what percentage of participants fall into each day 28 status category? Column percentages answer a different question: within each day 28 status category, what percentage come from each site? The correct choice depends on the reporting question.
The following example creates a cross-tabulation of treatment arm by sex:
This may be useful for a baseline summary. However, the data manager should be careful not to overinterpret small differences. In randomized trials, baseline tables describe the sample; they are not usually intended to drive post-randomization decisions unless predefined procedures require it.
Part 3
Cross-tabulations can also reveal data quality problems:
If one site has many `Pending` visits while other sites have few, the issue may be delayed data entry, true workflow differences, or misunderstanding of visit status coding. The table points to a question; it does not answer the question by itself.
For more controlled reporting, cross-tabulations can be built using `count()` and `group_by()`:
Part 4
This long-format table is often easier to export, join, plot, or use in dashboards than a wide cross-tabulation. The appropriate format depends on the next step.
The denominator should be shown or explained in any table intended for decision-making. Many misunderstandings in clinical reporting arise not from complex statistics, but from unclear denominators.