Repo for training LLM tutor
This repository collects structured course materials from Data 88E (Economic Models), prepared in multiple formats for use in AI teaching experiments, and notably this material is all in the public domain.
The goal is two-fold:
- For teachers — to explore how course content can be packaged for Custom LLMs, Retrieval-Augmented Generation (RAG), or LoRA fine-tuning.
- For students — to learn hands-on how to build and test small language models using publicly accessible academic materials.
The same content is offered in three parallel formats:
-
Individual Markdown Files within Folders
F24Lec_MD/— lecture notebooks (lec01,lec02, …).F24Textbook_MD/— textbook chapters (00-Intro,01-Demand, …).F24LS_md/— slide decks in Markdown.- Each folder contains a
summary.yamldescribing its contents. - Lecture Notebooks and Textbook Chapters are in nested folders by week for easier navigation.
-
Zipped Folders
.ziparchives of the above folders for easier upload to platforms that allow compressed input.
-
Concatenated Mega Files ( Eg for Gemini with 10 file limit)
- One Markdown file per content type, merging all chapters/lectures/slides into a single file.
- Useful for services (Custom GPTs, Claude, Gemini) with strict file count limits.
- Each mega file begins with a summary for navigation.
course_summary.yaml— overview of weeks, topics, and goals.LecNB_summary.yaml— lecture notebook index with short blurbs.week_to_readings.yaml— links schedule weeks to readings. - the textbook wasnt lining up with the weeks properlysummary.yaml— inside each folder, lightweight metadata describing scope.
- Upload mega files or zips to custom LLM platforms.
- Use YAML summaries to improve chunking and retrieval quality.
- Try different strategies (many small files vs. one large file) and compare performance.
This repo doubles as a sandbox for learning AI workflows:
-
Custom LLMs:
Experiment with uploading the mega files into tools like Custom GPTs, Claude Projects, or Gemini Apps. Compare how they answer student questions. -
RAG (Retrieval-Augmented Generation):
Store the individual Markdown files in a vector database (e.g., Chroma, FAISS, Weaviate). Build a simple retrieval pipeline to answer questions from the textbook. -
LoRA Fine-Tuning:
Pair textbook files (formal explanations) with lecture files (worked examples). Run a lightweight fine-tuning job on an open model (e.g. LLaMA-2, Mistral, Phi-3).
- Transparency: These are publicly accessible, openly licensed academic materials.
- Pedagogy: Students see how raw educational content becomes training data.
- Experimentation: Both teachers and students can test tradeoffs (structured YAML vs. raw Markdown, folders vs. mega files, RAG vs. fine-tuning).
- Try building your own course-aware study assistant.
- Compare retrieval quality with different chunking strategies.
- Share your results with the teaching community.
This repo is both a curricular resource and a playground for building and studying how language models learn.