Protocol Translation and CRF Design

Common CRF Design Errors

30-45 minutes Foundational Step 5 of 7

Accordion

Common CRF Design Errors

5 / 7

Accordion

Common CRF Design Errors

Part 1

Many data quality problems originate during CRF design rather than during data entry or analysis. When forms are unclear, incomplete, inconsistent, or poorly aligned with the protocol, downstream teams spend significant time resolving issues that could have been prevented. Understanding common CRF design errors helps data managers build stronger systems from the beginning. One common error is collecting unnecessary data. Study teams may be tempted to include variables because they are interesting or because they might be useful later. However, every additional field requires training, entry, validation, cleaning, storage, and governance. In participant-facing studies, unnecessary data collection may also raise ethical concerns because researchers should collect only data that are justified by the approved study purpose. Another common error is using free text where structured data are needed. Free text may be appropriate for narratives, but it is inefficient for variables that require counting, grouping, modeling, or reporting. For example, treatment names entered as free text may appear as "artemether-lumefantrine," "AL," "Lumefantrine combo," and "Coartem." These may refer to the same treatment but require cleaning before analysis. A coded dropdown would prevent much of this variation.

Part 2

Poor field labels are another frequent source of error. Labels that lack timepoints, units, or context invite inconsistent interpretation. A field labeled "Weight" may be unclear if the study collects weight at screening, enrollment, day 7, and day 28. A better label would specify "Weight at enrollment (kg)." Similarly, laboratory fields should specify units and, where relevant, specimen type or method. Inconsistent coding is a subtle but serious problem. If Yes/No variables are coded differently across forms, analysis scripts may misclassify responses. If missing values are represented using multiple codes without documentation, data cleaning becomes more difficult. Standard code lists should be defined early and applied consistently. Another error is failing to design for longitudinal structure. Repeated visits, unscheduled events, adverse events, and repeated laboratory samples require careful database design. If a team creates separate variables for every possible adverse event, the form becomes unwieldy and cannot handle unexpected numbers of events. A repeating instrument is often more appropriate.

Part 3

Finally, CRFs often fail because they are not tested with realistic scenarios. A form may appear complete during design but fail during actual use. User Acceptance Testing should include typical cases, edge cases, missing data scenarios, invalid values, unusual but plausible clinical events, and multisite workflows. Testing should involve the people who will use the forms, not only the central design team. Table 2.9: Common CRF Design Errors and Prevention Strategies