Skip to content

A hands-on mini project exploring Probability Mass Functions (PMF) and Cumulative Distribution Functions (CDF) using a synthetic Lagos commute dataset, inspired by Think Stats.

License

OfficiallyXenos/understanding-pmf-cdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Understanding PMF and CDF

A hands-on mini project exploring Probability Mass Functions (PMF) and Cumulative Distribution Functions (CDF) using a synthetic Lagos commute dataset, inspired by Think Stats by Allen B. Downey.


🧠 Project Overview

This project serves as a practical exercise to reinforce the concepts of PMF and CDF, two foundational ideas in probability and data analysis.
Using a synthetic dataset that simulates the daily commuting behavior of Lagos residents, I explore how probability distributions can describe real-world phenomena such as transport choice and travel time.


📁 Dataset Description

File: lagos_commute.csv
Rows: 1000 commuters

Column Description
age Age of the commuter (18–65 years)
transport_mode Type of transport used (bus, car, okada, BRT, ferry)
distance_km One-way commute distance in kilometers
commute_time Commute time in minutes

The dataset was synthetically generated using probability distributions to mimic realistic commuting patterns in Lagos.


📘 Project Structure

understanding-pmf-cdf/
│
├── data/
│   └── lagos_commute.csv
│
├── notebooks/
│   └── PMF_&_CDF_Project.ipynb     # Section 1 completed, Section 2 in progress
│
├── README.md
│
└── requirements.txt                # (optional) numpy, pandas, matplotlib, empiricaldist

🧩 Sections Overview

Section 1 – Core PMF & CDF Exercises

Focused on building a solid understanding of empirical distributions:

  1. PMF of transport modes
  2. PMF of commute times (rounded bins)
  3. Comparative PMFs of bus vs car
  4. CDF of commute times
  5. Tail probabilities and percentiles
  6. Mode and spread of commute times
  7. CDF comparisons between okada and BRT
  8. Conditional PMF for long-distance commuters
  9. Complementary probabilities using CDF logic
  10. Conceptual analysis of policy impact on distributions

Section 2 – Applied PMF & CDF

An applied analysis exploring how PMF and CDF can help tell a story about Lagos commuting behavior — integrating data visualization, insights, and real-world interpretation.

This section demonstrates:

  • CDF-based comparative analysis across transport modes
  • Interpretation of percentile thresholds and policy implications
  • Visual storytelling with overlapping probability plots
  • Clear probability-based conclusions suitable for decision-making

🧮 Learning Objectives

  • Understand how PMF quantifies the probability of discrete outcomes
  • Understand how CDF captures cumulative probability and percentiles
  • Practice probability interpretation using empirical distributions
  • Develop data storytelling skills through visual probability analysis

📊 Tools & Libraries

  • pandas – data wrangling
  • numpy – numerical computation
  • matplotlib – data visualization
  • empiricaldist – PMF and CDF construction (from Think Stats)

🚀 How to Run

  1. Clone the repository

    git clone https://github.com/<your-username>/understanding-pmf-cdf.git
    cd understanding-pmf-cdf
  2. Install dependencies

    pip install -r requirements.txt
  3. Open the notebook

    jupyter notebook notebooks/PMF_&_CDF_Project.ipynb

✍️ Author

Akintomiwa
📍 Ibadan, Nigeria
📧 [email protected]


🧾 License

This project is released under the MIT License.
You are free to fork, modify, and learn from it.


🌟 Acknowledgment

Inspired by Allen B. Downey’s Think Stats, an excellent open-access book on statistical thinking for programmers.

About

A hands-on mini project exploring Probability Mass Functions (PMF) and Cumulative Distribution Functions (CDF) using a synthetic Lagos commute dataset, inspired by Think Stats.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published