Developing a Data Collection Matrix
2.4 Developing a Data Collection Matrix
Reading 1
3 / 7
2.4 Developing a Data Collection Matrix
A data collection matrix is a structured table that maps variables or forms to study visits, timepoints, or events. It is one of the most useful tools for translating the protocol into practical data collection. The matrix helps the study team see what data are collected, when they are collected, and whether any required information is missing or duplicated.
In many protocols, the schedule of events describes procedures across visits. For example, screening may include consent, eligibility, demographics, medical history, and baseline laboratory tests. Enrollment may include randomization, treatment allocation, baseline vital signs, and medication dispensing. Follow-up visits may include symptoms, adherence, adverse events, laboratory results, and outcome assessment. A data collection matrix converts this schedule into a form that data managers can use to build CRFs and databases.
The matrix may be organized by forms, variables, or domains. A form-level matrix shows which CRFs are completed at which visits. A variable-level matrix is more detailed and shows individual data items. For complex studies, both versions may be useful. Early in design, a form-level matrix helps define instruments. Later, a variable-level matrix supports data dictionary development.
A good matrix prevents several common problems. It reduces duplicate collection by showing when the same variable is captured repeatedly without purpose. It identifies missing variables by showing objectives or outcomes that have no corresponding collection point. It supports longitudinal design by clarifying which data are collected once and which repeat over time. It also helps database builders decide whether to use repeated instruments, longitudinal events, or separate forms in REDCap.
In multisite studies, the matrix supports standardization. All sites can see the expected forms at each visit. Training teams can use the matrix to explain workflows. Monitors can use it to assess completeness. Statisticians can use it to understand repeated measurements and timepoints. When amendments occur, the matrix can be updated alongside CRFs and data dictionaries.
**Table 2.3: Example Data Collection Matrix**
| Data item or form | Screening | Enrollment | Day 7 | Day 28 | Close-out |
|---|---|---|---|---|---|
| Informed consent status | X | ||||
| Inclusion and exclusion criteria | X | ||||
| Demographics | X | ||||
| Randomization or allocation | X | ||||
| Vital signs | X | X | X | X | |
| Malaria laboratory result | X | X | |||
| Treatment adherence | X | X | |||
| Adverse events | X | X | X | X | |
| Primary outcome | X | ||||
| Study completion status | X |
The matrix should not be treated as a purely administrative tool. It is a design instrument that brings together scientific requirements, field operations, database structure, monitoring needs, and analysis planning. A well-prepared matrix makes the later stages of database design much easier.