Why Documentation Matters
11.1 Why Documentation Matters
Reading 1
3 / 7
11.1 Why Documentation Matters
A dataset without documentation is fragile. Even if the data are accurate, future users may not understand what variables mean, how values were coded, what population is represented, which records were excluded, which dates were derived, or what missing codes mean. Clinical research data are often used long after collection ends. They may support manuscripts, audits, secondary analyses, data sharing, regulatory review, or future pooled analyses. Documentation makes these uses possible.
Metadata are data about data. In a clinical dataset, metadata may describe variable names, labels, definitions, units, allowed values, missing value codes, derivation rules, collection instruments, study visits, data sources, and access conditions. Metadata also describe the dataset as a whole: title, creators, version, date, license, governance restrictions, and contact information. Standards such as FAIR emphasize that data should be findable, accessible, interoperable, and reusable [@wilkinson2016fair].
Documentation is not something to create only at the end of a study. It begins with protocol translation and CRF design. The data dictionary, CRF completion guidelines, validation rules, and data management plan are all documentation artifacts. At the end of the study, these materials should be reconciled and updated so that the final dataset can be understood.
| Documentation artifact | Purpose |
|---|---|
| Protocol | Defines study objectives, population, procedures, and outcomes |
| CRF | Defines data collection fields and structure |
| Data dictionary | Defines variables, labels, field types, and choices |
| Codebook | Explains dataset variables, coding, missingness, and derivations |
| Data management plan | Describes data handling, quality, security, and archiving |
| Cleaning log | Records data checks, queries, and decisions |
| Analysis dataset specification | Defines analysis-ready variables and derivations |
| Repository metadata | Supports discovery, citation, and reuse |
**Figure 11.1 Placeholder: Documentation lifecycle.**
This figure should show protocol, CRF, data dictionary, database build, data cleaning, analysis dataset, codebook, archive, and controlled sharing.