A hands-on mini project exploring Probability Mass Functions (PMF) and Cumulative Distribution Functions (CDF) using a synthetic Lagos commute dataset, inspired by Think Stats by Allen B. Downey.
This project serves as a practical exercise to reinforce the concepts of PMF and CDF, two foundational ideas in probability and data analysis.
Using a synthetic dataset that simulates the daily commuting behavior of Lagos residents, I explore how probability distributions can describe real-world phenomena such as transport choice and travel time.
File: lagos_commute.csv
Rows: 1000 commuters
| Column | Description |
|---|---|
age |
Age of the commuter (18–65 years) |
transport_mode |
Type of transport used (bus, car, okada, BRT, ferry) |
distance_km |
One-way commute distance in kilometers |
commute_time |
Commute time in minutes |
The dataset was synthetically generated using probability distributions to mimic realistic commuting patterns in Lagos.
understanding-pmf-cdf/
│
├── data/
│ └── lagos_commute.csv
│
├── notebooks/
│ └── PMF_&_CDF_Project.ipynb # Section 1 completed, Section 2 in progress
│
├── README.md
│
└── requirements.txt # (optional) numpy, pandas, matplotlib, empiricaldist
Focused on building a solid understanding of empirical distributions:
- PMF of transport modes
- PMF of commute times (rounded bins)
- Comparative PMFs of bus vs car
- CDF of commute times
- Tail probabilities and percentiles
- Mode and spread of commute times
- CDF comparisons between okada and BRT
- Conditional PMF for long-distance commuters
- Complementary probabilities using CDF logic
- Conceptual analysis of policy impact on distributions
An applied analysis exploring how PMF and CDF can help tell a story about Lagos commuting behavior — integrating data visualization, insights, and real-world interpretation.
This section demonstrates:
- CDF-based comparative analysis across transport modes
- Interpretation of percentile thresholds and policy implications
- Visual storytelling with overlapping probability plots
- Clear probability-based conclusions suitable for decision-making
- Understand how PMF quantifies the probability of discrete outcomes
- Understand how CDF captures cumulative probability and percentiles
- Practice probability interpretation using empirical distributions
- Develop data storytelling skills through visual probability analysis
pandas– data wranglingnumpy– numerical computationmatplotlib– data visualizationempiricaldist– PMF and CDF construction (from Think Stats)
-
Clone the repository
git clone https://github.com/<your-username>/understanding-pmf-cdf.git cd understanding-pmf-cdf
-
Install dependencies
pip install -r requirements.txt
-
Open the notebook
jupyter notebook notebooks/PMF_&_CDF_Project.ipynb
Akintomiwa
📍 Ibadan, Nigeria
📧 [email protected]
This project is released under the MIT License.
You are free to fork, modify, and learn from it.
Inspired by Allen B. Downey’s Think Stats, an excellent open-access book on statistical thinking for programmers.