This is a working version of the syllabus for the course Statistical Software Development in R. The course is intended as a short, intensive offering for the broader UCSF student community interested in statistical programming, software development, or learning coding fundamentals in the era of AI. Its goal is to help students translate their ideas into executable functions as well as well-structured and robust packages that can enhance their scientific research and support peer-reviewed publications.
Instructor: Wanjun Gu and Dr. Sergio Baranzini
Duration: 3 weeks | 3 sessions per week | ~1 hour each
Format: Lecture + Hands-on Lab + Project Development
This course provides an accelerated, hands-on introduction to R-based software engineering, focusing on developing high-quality, reusable, and CRAN-compliant R packages. Students will learn R syntax, functional and object-oriented programming, software documentation, debugging, and collaboration workflows. By the end of the course, each student will design and publish their own R package, from concept to CRAN submission, including documentation, vignettes, and a demo presentation.
By the end of the course, students will be able to:
- Write efficient, idiomatic R code following best practices.
- Build modular, well-documented R packages using devtools, usethis, and pkgdown.
- Apply functional and object-oriented programming (S4 & R6) principles.
- Develop and document reproducible workflows with R Markdown and Roxygen2.
- Incorporate visualization and data objects into packages.
- Debug, test, and prepare packages for CRAN submission.
- Collaborate via GitHub and conduct peer reviews.
Goal: Master R fundamentals and software design principles.
Session 1 – R Basics & Syntax
- R language fundamentals, data types, and control structures.
- Data import/export and I/O operations.
- Writing reusable scripts.
- Good coding style (tidyverse style guide, naming conventions).
Session 2 – Popular Libraries & Best Practices
- Overview of key libraries: tidyverse, data.table, ggplot2, purrr, stringr.
- Structuring projects and working directories.
- Code efficiency & profiling basics.
- Hands-on: transforming and visualizing a dataset.
Session 3 – Functional & Object-Oriented Programming in R
- Functional programming concepts: closures, higher-order functions, map/reduce.
- Object systems: S3, S4, and R6 classes.
- Writing a simple R6 class with methods and fields.
- Homework: Design a small R class to encapsulate a dataset and its summary methods.
Goal: Learn how to build a full R package structure and documentation.
Session 4 – R Package Structure & Tools
- Creating packages with devtools and usethis.
- Understanding package structure (DESCRIPTION, NAMESPACE, man/, R/, data/).
- Using pkgdown to create documentation websites.
- Version control with Git & GitHub.
Session 5 – Documentation & Vignettes
- Writing documentation with Roxygen2.
- Linking documentation between functions and classes.
- Writing vignettes with R Markdown.
- Building the pkgdown site.
- Hands-on: Document one function and one class; create a sample vignette.
Session 6 – Visualization & Data Objects in Packages
- Packaging datasets (.rda) and sample data.
- Writing visualization functions with ggplot2.
- Best practices for reproducibility and clarity.
- Mini-lab: add a data object and a visualization function to your package.
Goal: Develop, debug, and publish a CRAN-ready package.
Session 7 – LLM Coding Assistance (Vibe Coding with ChatGPT & Claude)
- Integrating AI assistants into R coding.
- Prompting for code generation and refactoring.
- Limitations and responsible use.
- Hands-on: use ChatGPT/Claude to generate and refine one function from your project.
Session 8&9 – Final Presentations & Peer Review
- Each student presents their package: purpose, structure, and demo.
- Classmates install and test peers’ packages from GitHub or CRAN.
- Group feedback and discussion on debugging and package design.
- Closing remarks: publishing to CRAN and future directions (e.g., Bioconductor submissions).
Create a fully functional R package that addresses a novel analytical, computational, or visualization problem.
-
Problem Definition & Novelty Check
- Identify a problem suitable for an R toolkit or function suite.
- Conduct a literature and CRAN review to ensure novelty.
-
Design & Workflow Planning
- Draw a workflow diagram and describe the logic of your package.
- Outline inputs, outputs, and inter-function connections.
-
Implementation
-
Write scripts for:
- Functions (core functionalities).
- Classes (S4 or R6) with methods.
- Data Objects (.rda) to support functions.
- Visualization (ggplot2 or base R).
-
-
Documentation & Website
- Document all functions and classes with Roxygen2.
- Write a vignette in R Markdown demonstrating use cases.
- Build a pkgdown documentation site.
-
Testing & Debugging
- Write unit tests for key functions using testthat.
- Run R CMD check to ensure CRAN compliance.
-
Submission & Presentation
- Host your project on GitHub with README and installation instructions.
- Optionally submit to CRAN if ready.
- Present in Session 8&9 with slides and live demo.
- R (≥ 4.3) and RStudio
- Packages:
devtools,usethis,pkgdown,roxygen2,testthat,tidyverse,ggplot2,data.table,purrr, etc. - Git & GitHub account
- ChatGPT or Claude access (for AI-assisted development). Pro license might be needed for additional tokens (usage limits).
| Component | Weight | Description |
|---|---|---|
| Class Participation | 20% | Attending to class and engage in discussions |
| Mid-project Check-in | 20% | Progress presentation of project design and structure |
| Final Project Package | 40% | Fully functional R package with documentation |
| Final Presentation & Peer Review | 20% | Demo and evaluation session |
- Hadley Wickham & Jenny Bryan. R Packages (2nd Edition)
- Hadley Wickham. Advanced R (2nd Edition)
- CRAN Policy Manual (https://cran.r-project.org/web/packages/policies.html)
- R Style Guide (https://style.tidyverse.org)
- R Documentation System (Roxygen2 and pkgdown guides)