CLiREN-LMS
Introduction to R for Clinical Data Management

R Projects and Folder Structure

R Projects and Folder Structure

30-45 minutes Applied Step 5 of 9
Accordion

R Projects and Folder Structure

5 / 9
Accordion

R Projects and Folder Structure

Part 1
An R project is a project-specific workspace. In RStudio, a project is usually represented by a file ending in `.Rproj`. Opening the project file sets the working context for the study or course exercise. This is important because many errors in R occur when the user is working in the wrong folder. If R cannot find a file, the problem is often not that the file does not exist, but that R is looking in a different location. In clinical data management, project organization is not cosmetic. It supports traceability. Raw data should be preserved in a known location. Scripts should be stored separately from data. Outputs should be saved in a way that makes it clear which script produced them and when. Documentation should be easy to find. A folder structure does not need to be complex, but it should make sense and be used consistently. A simple project structure for a clinical data management exercise may look like this:
Part 2
The `data_raw` folder should contain exports as received from the source system. These files should not be edited manually. If a REDCap CSV export is opened in Excel and saved again, the file may be changed inadvertently. Date formats may be altered, leading zeros may be lost, long identifiers may be converted to scientific notation, and text encoding may be modified. The safer approach is to keep raw exports unchanged and let R read them directly. The `data_clean` folder can contain cleaned or derived datasets produced by scripts. These outputs should be considered products of the workflow, not replacements for the raw data. If a cleaning decision changes, the script can be updated and the cleaned dataset regenerated from the raw data. This approach supports transparency because the raw data remain available. The `scripts` folder contains R scripts. In a simple course exercise, three scripts may be enough. In a complex study, there may be many scripts, and the team may need conventions for naming and review. The `outputs` folder contains query listings, reports, tables, and figures. The `documentation` folder contains supporting materials such as the protocol, data dictionary, CRF completion guidelines, data management plan, and validation rules.
Part 3
The following table presents one practical way to organize project files: The working directory is the folder where R looks for files by default. In an RStudio project, the working directory is usually the project root. A good project structure reduces the need to use absolute paths such as `C:/Users/.../Desktop/...`, which make scripts difficult to share and reuse. Relative paths, such as `data_raw/redcap_export.csv`, are more portable because they are interpreted relative to the project folder. This code tells R to look inside the `data_raw` folder within the current project. If another team member opens the same R project on a different computer, the script can still work if the folder structure is the same. This is one reason R projects are valuable in team-based clinical research.