CLiREN-LMS
Data Documentation and Metadata

Why Documentation Matters

Why Documentation Matters

30-45 minutes Applied Step 5 of 7
Accordion

Why Documentation Matters

5 / 7
Accordion

Why Documentation Matters

Part 1
A dataset without documentation is fragile. Even if the data are accurate, future users may not understand what variables mean, how values were coded, what population is represented, which records were excluded, which dates were derived, or what missing codes mean. Clinical research data are often used long after collection ends. They may support manuscripts, audits, secondary analyses, data sharing, regulatory review, or future pooled analyses. Documentation makes these uses possible. Metadata are data about data. In a clinical dataset, metadata may describe variable names, labels, definitions, units, allowed values, missing value codes, derivation rules, collection instruments, study visits, data sources, and access conditions. Metadata also describe the dataset as a whole: title, creators, version, date, license, governance restrictions, and contact information. Standards such as FAIR emphasize that data should be findable, accessible, interoperable, and reusable [@wilkinson2016fair]. Documentation is not something to create only at the end of a study. It begins with protocol translation and CRF design. The data dictionary, CRF completion guidelines, validation rules, and data management plan are all documentation artifacts. At the end of the study, these materials should be reconciled and updated so that the final dataset can be understood.
Part 2
Figure 11.1 Placeholder: Documentation lifecycle. This figure should show protocol, CRF, data dictionary, database build, data cleaning, analysis dataset, codebook, archive, and controlled sharing.