Sources of Data Quality Problems
5.3 Sources of Data Quality Problems
Reading 1
3 / 7
5.3 Sources of Data Quality Problems
Data quality problems can arise at any point in the clinical research lifecycle. They may originate in the protocol, CRF design, database configuration, site workflow, source documentation, data entry, monitoring, data transfers, or analysis preparation. Understanding sources of quality problems helps teams prevent them rather than merely reacting after they appear.
Protocol ambiguity is a common source of downstream quality problems. If an outcome is not clearly defined, sites may collect it differently. If the visit window is unclear, late visits may be classified inconsistently. If eligibility criteria are complex and not operationalized into clear data fields, screen failures and enrollments may be difficult to verify. These problems are best prevented during protocol review and CRF design.
CRF and database design errors create many avoidable data problems. Vague labels, missing units, inconsistent coding, excessive free text, poor branching logic, and inadequate validation rules all increase error risk. A field labeled "date" without context may be interpreted differently by users. A laboratory field without units may combine values that cannot be compared. A free-text medication field may require extensive cleaning before analysis.
Site workflow challenges also affect data quality. Staff may be overburdened, data entry may occur long after visits, source documents may be incomplete, or internet connectivity may be unreliable. In multisite studies, different facilities may follow slightly different clinical documentation practices. Training, completion guidelines, and monitoring reports should be designed with these realities in mind.
External data sources introduce additional risks. Laboratory files, imaging reports, pharmacy data, electronic health record extracts, and imported spreadsheets may have different identifiers, formats, units, date conventions, or coding structures. Data managers must reconcile external data with CRF data and ensure that transfers are complete, secure, and correctly mapped.
Human error is inevitable. Even trained users may transpose digits, select the wrong option, skip a field, enter data into the wrong record, or misunderstand a validation warning. The goal of data quality management is not to pretend errors will never occur, but to design systems that reduce errors, detect them early, correct them transparently, and learn from recurring patterns.
**Table 5.2: Sources of Data Quality Problems and Prevention Strategies**
| Source | Example problem | Prevention strategy |
|---|---|---|
| Protocol ambiguity | Outcome definition unclear | Resolve during protocol review and document definitions |
| CRF design | Missing units for laboratory values | Include units in labels and validation rules |
| Database setup | Branching logic hides required fields incorrectly | Test scenarios before production |
| Site workflow | Delayed entry after visits | Define entry timelines and monitor lag |
| Source documents | Clinical notes incomplete | Train staff and clarify source documentation expectations |
| External data | Lab file uses different IDs | Data transfer agreement and reconciliation |
| Human error | Digits transposed during entry | Validation ranges and source verification |
| Access control | Shared accounts obscure audit trail | Individual accounts and user training |