CLiREN-LMS
Foundations of Clinical Research Data Management

Data Flow in Clinical Research

1.7 Data Flow in Clinical Research

30-45 minutes Foundational Step 3 of 6
Reading 1

1.7 Data Flow in Clinical Research

3 / 6
Data flow describes the movement of information from its origin to its final use. A common flow begins with source documents, continues through CRFs or eCRFs, enters the electronic database, undergoes validation and cleaning, becomes an analysis dataset, appears in reports or dashboards, and is finally archived or shared under governance controls. Understanding data flow is essential because each transition introduces potential risks. Source documents are the original records where study information is first captured. They may include clinic notes, laboratory reports, pharmacy logs, consent forms, imaging reports, participant diaries, hospital registers, electronic health records, or direct measurement forms. In clinical research, source documentation matters because the database should be verifiable. When monitors or auditors ask whether a database value is correct, the study team must be able to trace that value back to an appropriate source. CRFs and eCRFs structure the information needed by the protocol. They should not simply copy every piece of clinical information available. Instead, they should capture the variables required for study objectives, safety monitoring, compliance, and analysis. The database then stores these variables in a structured format. Electronic systems improve this flow by applying validation checks, controlling access, and recording audit trails. Once data are in the database, cleaning and transformation prepare them for analysis. For example, raw visit dates may be used to derive visit windows, age may be calculated from date of birth and enrollment date, and categorical variables may be recoded for reporting. These transformations should be documented and reproducible. In this course, R will be used to demonstrate how cleaning scripts can preserve a clear record of what was done to the data. Data flow also includes movement out of the database. Exports may go to statisticians, dashboards, safety teams, sponsors, ethics committees, or repositories. Each export must respect confidentiality, access permissions, consent restrictions, and version control. A dataset sent to an analyst should have a clear export date, variable dictionary, cleaning status, and description of any exclusions or derived variables. [Figure 1.3: Suggested flow diagram showing source documents, CRFs, REDCap database, cleaning scripts, analysis dataset, reports, archive, and controlled sharing]