CLiREN-LMS
Introduction to R for Clinical Data Management

Installing and Loading Packages

6.7 Installing and Loading Packages

30-45 minutes Applied Step 3 of 11
Reading 1

6.7 Installing and Loading Packages

3 / 11
Base R includes many useful functions, but most applied workflows rely on packages. A package is a collection of functions, data, and documentation. Installing a package makes it available on the computer. Loading a package makes it available in the current R session. This distinction is important. A package may be installed but not loaded. If a script uses functions from a package, the script should load that package explicitly. Packages are installed with `install.packages()`. The following command installs several packages commonly used in introductory clinical data management exercises: ```r install.packages(c("tidyverse", "readxl", "janitor")) ``` The `tidyverse` package installs a collection of packages for data import, transformation, visualization, and programming. The `readxl` package imports Excel files. The `janitor` package provides convenient tools for cleaning column names and tabulating categorical variables. In a course environment, these packages should ideally be installed before the practical session to avoid delays caused by slow internet connections or institutional firewall restrictions. Once packages are installed, they can be loaded: ```r library(tidyverse) library(readxl) library(janitor) ``` The `library()` function loads the package into the current R session. A script should include the necessary `library()` calls near the top, so that a reviewer or colleague can see which packages are required. If a function comes from a package that has not been loaded, R may return an error such as: ```text Error in read_excel("data_raw/lab_results.xlsx") : could not find function "read_excel" ``` This error indicates that R does not currently know where to find `read_excel()`. Loading the `readxl` package usually resolves it: ```r library(readxl) ``` Some organizations need stronger control over package versions. This is because a script that works with one package version may behave differently with another version. For formal or regulated workflows, teams may use tools such as package version management, validated computing environments, code review, and documented testing. Introductory learners do not need to master these topics immediately, but they should understand the principle: R workflows used for important clinical research outputs should be controlled enough to be reliable.
PackageMain use in this chapterExample function
`tidyverse`Data import, manipulation, and summarization`read_csv()`, `filter()`, `summarise()`
`readxl`Import Excel workbooks`read_excel()`
`janitor`Clean names and tabulate data`clean_names()`, `tabyl()`
`tibble`Create modern data frames`tibble()`
`dplyr`Filter, select, mutate, group, and summarize data`filter()`, `mutate()`, `count()`
`readr`Read CSV and other rectangular text files`read_csv()`
Package functions should be used with understanding. Convenience does not remove responsibility. If a script uses `clean_names()` to standardize variable names, the data manager should know what the function does. It typically converts names to a consistent lower-case style, replaces spaces with underscores, and removes problematic characters. This may be helpful, but it also means that column names in R may differ from the original REDCap export. The script should make this transformation clear.