Skip to content

BaranziniLab/ssdr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Statistical Software Development in R

This is a working version of the syllabus for the course Statistical Software Development in R. The course is intended as a short, intensive offering for the broader UCSF student community interested in statistical programming, software development, or learning coding fundamentals in the era of AI. Its goal is to help students translate their ideas into executable functions as well as well-structured and robust packages that can enhance their scientific research and support peer-reviewed publications.

Instructor: Wanjun Gu and Dr. Sergio Baranzini

Duration: 3 weeks | 3 sessions per week | ~1 hour each

Format: Lecture + Hands-on Lab + Project Development

Course Description

This course provides an accelerated, hands-on introduction to R-based software engineering, focusing on developing high-quality, reusable, and CRAN-compliant R packages. Students will learn R syntax, functional and object-oriented programming, software documentation, debugging, and collaboration workflows. By the end of the course, each student will design and publish their own R package, from concept to CRAN submission, including documentation, vignettes, and a demo presentation.

Learning Objectives

By the end of the course, students will be able to:

  1. Write efficient, idiomatic R code following best practices.
  2. Build modular, well-documented R packages using devtools, usethis, and pkgdown.
  3. Apply functional and object-oriented programming (S4 & R6) principles.
  4. Develop and document reproducible workflows with R Markdown and Roxygen2.
  5. Incorporate visualization and data objects into packages.
  6. Debug, test, and prepare packages for CRAN submission.
  7. Collaborate via GitHub and conduct peer reviews.

Weekly Schedule

Week 1 – Foundations of R Programming & Software Design

Goal: Master R fundamentals and software design principles.

Session 1 – R Basics & Syntax

  • R language fundamentals, data types, and control structures.
  • Data import/export and I/O operations.
  • Writing reusable scripts.
  • Good coding style (tidyverse style guide, naming conventions).

Session 2 – Popular Libraries & Best Practices

  • Overview of key libraries: tidyverse, data.table, ggplot2, purrr, stringr.
  • Structuring projects and working directories.
  • Code efficiency & profiling basics.
  • Hands-on: transforming and visualizing a dataset.

Session 3 – Functional & Object-Oriented Programming in R

  • Functional programming concepts: closures, higher-order functions, map/reduce.
  • Object systems: S3, S4, and R6 classes.
  • Writing a simple R6 class with methods and fields.
  • Homework: Design a small R class to encapsulate a dataset and its summary methods.

Week 2 – Building R Packages & Documentation

Goal: Learn how to build a full R package structure and documentation.

Session 4 – R Package Structure & Tools

  • Creating packages with devtools and usethis.
  • Understanding package structure (DESCRIPTION, NAMESPACE, man/, R/, data/).
  • Using pkgdown to create documentation websites.
  • Version control with Git & GitHub.

Session 5 – Documentation & Vignettes

  • Writing documentation with Roxygen2.
  • Linking documentation between functions and classes.
  • Writing vignettes with R Markdown.
  • Building the pkgdown site.
  • Hands-on: Document one function and one class; create a sample vignette.

Session 6 – Visualization & Data Objects in Packages

  • Packaging datasets (.rda) and sample data.
  • Writing visualization functions with ggplot2.
  • Best practices for reproducibility and clarity.
  • Mini-lab: add a data object and a visualization function to your package.

Week 3 – Project Development & Final Presentations

Goal: Develop, debug, and publish a CRAN-ready package.

Session 7 – LLM Coding Assistance (Vibe Coding with ChatGPT & Claude)

  • Integrating AI assistants into R coding.
  • Prompting for code generation and refactoring.
  • Limitations and responsible use.
  • Hands-on: use ChatGPT/Claude to generate and refine one function from your project.

Session 8&9 – Final Presentations & Peer Review

  • Each student presents their package: purpose, structure, and demo.
  • Classmates install and test peers’ packages from GitHub or CRAN.
  • Group feedback and discussion on debugging and package design.
  • Closing remarks: publishing to CRAN and future directions (e.g., Bioconductor submissions).

Final Project: Develop and Publish an R Package

Objective

Create a fully functional R package that addresses a novel analytical, computational, or visualization problem.

Project Phases

  1. Problem Definition & Novelty Check

    • Identify a problem suitable for an R toolkit or function suite.
    • Conduct a literature and CRAN review to ensure novelty.
  2. Design & Workflow Planning

    • Draw a workflow diagram and describe the logic of your package.
    • Outline inputs, outputs, and inter-function connections.
  3. Implementation

    • Write scripts for:

      • Functions (core functionalities).
      • Classes (S4 or R6) with methods.
      • Data Objects (.rda) to support functions.
      • Visualization (ggplot2 or base R).
  4. Documentation & Website

    • Document all functions and classes with Roxygen2.
    • Write a vignette in R Markdown demonstrating use cases.
    • Build a pkgdown documentation site.
  5. Testing & Debugging

    • Write unit tests for key functions using testthat.
    • Run R CMD check to ensure CRAN compliance.
  6. Submission & Presentation

    • Host your project on GitHub with README and installation instructions.
    • Optionally submit to CRAN if ready.
    • Present in Session 8&9 with slides and live demo.

Required Software & Tools

  • R (≥ 4.3) and RStudio
  • Packages: devtools, usethis, pkgdown, roxygen2, testthat, tidyverse, ggplot2, data.table, purrr, etc.
  • Git & GitHub account
  • ChatGPT or Claude access (for AI-assisted development). Pro license might be needed for additional tokens (usage limits).

Assessment

Component Weight Description
Class Participation 20% Attending to class and engage in discussions
Mid-project Check-in 20% Progress presentation of project design and structure
Final Project Package 40% Fully functional R package with documentation
Final Presentation & Peer Review 20% Demo and evaluation session

Suggested Readings & References

About

Statistical Software Development in R (SSDR)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published