Foundations of Clinical Research Data Management: Summary and Assessment
Case Study: Data Quality Risks in a Multisite Pediatric Fever Study
Case Study
5 / 7
Case Study: Data Quality Risks in a Multisite Pediatric Fever Study
Case Study
Case Study: Data Quality Risks in a Multisite Pediatric Fever Study
A research network is conducting a multisite observational study of pediatric fever management in four Kenyan hospitals. The study aims to describe treatment patterns and determine whether children receive follow-up within seven days of discharge. Each site enrolls participants, records baseline clinical findings, collects laboratory test results, documents treatment, and follows participants by phone after discharge. The study uses REDCap, but two sites have intermittent internet connectivity and sometimes complete paper forms before entering data later.
After three months, the central data manager reviews the database and identifies several concerns. Some participants have follow-up dates that occur before discharge dates. One site has many missing treatment variables. Another site records temperature in Fahrenheit even though the database expects Celsius. Several users appear to be sharing one login account. A large number of phone numbers are visible to users at all sites, even though each site should only access its own participants. The statistician also reports that treatment names are difficult to summarize because some are entered as free text with inconsistent spelling.
These issues illustrate why clinical data management must be planned and monitored continuously. The incorrect follow-up dates suggest a need for validation rules and date logic checks. Missing treatment variables may reflect poor completion guidance, workflow problems, or insufficient training. The temperature unit problem indicates that field labels, units, and validation ranges were not clear enough or were not reinforced during training. Shared login accounts undermine audit trails because the study cannot reliably determine who entered or changed data. Site-wide visibility of phone numbers violates the principle of least privilege and may breach confidentiality expectations. Free-text treatment names create analysis difficulties that could have been reduced through coded response options.
The data manager should respond by combining immediate corrective action with system improvement. Existing discrepancies should be queried and resolved against source documents. REDCap validation rules should be reviewed, including date logic and temperature ranges. Field labels should specify units clearly. Treatment fields should be revised to use standardized coded options where possible, with an "other, specify" option only when necessary. User accounts should be individualized, shared accounts disabled, and Data Access Groups configured so sites see only their own records. Site staff should receive refresher training, and a weekly data quality report should monitor missing values, date inconsistencies, query aging, and site-specific completion rates.
This case also demonstrates that data quality is not only a technical problem. It involves human workflows, training, governance, confidentiality, and communication. A well-designed database helps, but it must be supported by competent users, clear procedures, active monitoring, and ethical oversight.
### Case Study Questions
1. Which data quality dimensions are affected in this case?
2. Which issues create confidentiality or governance risks?
3. What REDCap features could reduce these problems?
4. What training topics should be prioritized during a refresher session?
5. How could R be used later in the study to monitor recurring data quality issues?