CLiREN-LMS
Protocol Translation and CRF Design

Data Dictionaries and Metadata

2.8 Data Dictionaries and Metadata

30-45 minutes Foundational Step 3 of 7
Reading 1

2.8 Data Dictionaries and Metadata

3 / 7
A data dictionary is a structured document that describes the variables in a dataset or database. It is one of the most important outputs of the CRF design process. While the CRF shows what the user sees, the data dictionary explains how the data are defined, coded, validated, and documented. It supports database development, statistical analysis, quality control, sharing, and archival. A basic data dictionary includes variable name, variable label, form or instrument, field type, response options, validation rules, units, branching logic, required status, and notes. More detailed dictionaries may include source document, collection timepoint, derivation rules, permissible ranges, missing value rules, data type, coding standard, and references to protocol sections. In REDCap, the data dictionary can be downloaded, edited as a CSV file, and uploaded to create or revise instruments. Metadata are data about data. They provide context that allows data to be interpreted correctly. For example, a dataset may contain a variable called `hb`, but without metadata it may not be clear whether this means hemoglobin, what unit is used, which device or laboratory method produced the value, when it was measured, what range is expected, or how missing values were handled. Metadata transform values into interpretable research evidence. Good metadata are essential for FAIR data practice. Data cannot be findable, interoperable, or reusable if future users cannot understand what the variables mean. They are also essential for reproducibility. If an analysis is repeated two years later, the analyst should be able to identify which dataset version was used, what variables meant, how categories were coded, and how derived variables were produced. In clinical research, data dictionaries should be treated as controlled documents. They should be reviewed by data managers, investigators, statisticians, and where appropriate, monitors or sponsors. Changes should be documented. If a variable definition changes during a study, the change may affect interpretation of data collected before and after the change. Without documentation, such changes can be invisible but damaging. **Table 2.7: Core Elements of a Clinical Research Data Dictionary**
Data dictionary elementDescriptionExample
Variable nameMachine-readable field name`weight_kg`
Variable labelHuman-readable descriptionWeight at enrollment in kilograms
Form or instrumentCRF where the field appearsEnrollment form
Field typeType of data entry fieldNumeric text box
UnitsMeasurement unitKilograms
Validation rulePermitted format or range1-200
Choices or codingPermitted categorical values1 = Male, 2 = Female
Branching logicConditions for displayShow pregnancy fields only if applicable
Required statusWhether completion is mandatoryRequired at enrollment
Source documentWhere value is verifiedClinic measurement sheet
NotesAdditional guidanceUse calibrated study scale