Skip to content

arpitmathur/regression_trees

Repository files navigation

Insurance Cost Prediction Interactive Website

An interactive web application that demonstrates regression trees on the insurance dataset with CCP alpha parameter control for cost complexity pruning. The app features an interactive decision tree visualization with beautiful node bubbles and clean edges, plus static analysis graphs.

Features

  • Interactive Decision Tree Visualization: Beautiful network-style tree with bubbles for nodes and clean lines for edges
  • CCP Alpha Parameter Control: Adjust the cost complexity parameter (50K to 1M) to see how pruning affects model performance
  • Optimal Tree Marker: Red star marker at the optimal alpha value (249,500) for best performance
  • Static Analysis Graphs:
    • Actual vs Predicted charges scatter plot with R² annotation
    • Individual predictor scatter plots (Age, BMI, Children vs Charges)
    • Feature importance bar chart
    • Residual analysis with Q-Q plot and residuals vs fitted
  • Full Tree Example: Static image showing the complete tree at alpha=0 (2,669 nodes, 20 levels)
  • Clean, Responsive Design: Modern blog-style layout with Google Fonts
  • Educational Content: Explains regression trees and alpha parameter concepts

Installation

  1. Install required packages:
pip install -r requirements.txt
  1. Launch the application:
python insurance_tree_app.py
  1. Open your browser and navigate to: http://127.0.0.1:8051

Usage

Interactive Controls

  • CCP Alpha Slider: Adjust the cost complexity parameter (50K to 1M)
    • Lower values = more complex trees (may overfit)
    • Higher values = simpler trees (may underfit)
    • Optimal value = 249,500 (marked with red star)
    • Default starts at optimal value for best performance

Visualizations

  1. Interactive Decision Tree: Network-style visualization with:

    • Bubbles for nodes (green=root, blue=left splits, red=right splits)
    • Clean lines connecting parent and child nodes
    • Node labels showing split conditions and predicted values
    • Tree statistics (nodes, leaves, depth, R², RMSE)
    • Full depth display (up to 20 levels)
  2. Static Analysis Graphs:

    • Actual vs Predicted: Scatter plot with R² score in corner
    • Predictor Scatter Plots: Age, BMI, Children vs Charges with trend lines
    • Feature Importance: Bar chart showing which features matter most
    • Residual Analysis: Q-Q plot and residuals vs fitted values
  3. Full Tree Example: Static image showing complete unpruned tree (alpha=0)

Understanding CCP Alpha Parameter

The CCP (Cost Complexity Pruning) alpha parameter controls tree pruning:

  • Alpha = 0: No pruning (full tree, 2,669 nodes, may overfit)
  • Alpha = 249,500: Optimal balance (25 nodes, best performance)
  • Higher Alpha: More aggressive pruning (simpler tree, may underfit)

As you adjust the alpha slider, observe how:

  • Tree complexity changes (number of nodes, leaves, depth)
  • Node colors and positions shift in the network visualization
  • Model performance varies (R², RMSE displayed in annotations)
  • Tree structure becomes simpler or more complex

Dataset

The app uses the insurance dataset with 1,338 insurance records including:

  • Target: Medical charges (in USD) - Range: $1,136 to $63,770
  • Features: Age, BMI, Children, Sex, Smoker status, Region
  • Categorical variables are automatically one-hot encoded for the regression tree
  • Feature columns: age, bmi, children, sex_male, smoker_yes, region_northwest, region_southeast, region_southwest

Technical Details

  • Framework: Dash (Python web framework)
  • Visualizations: Plotly (interactive charts) + Matplotlib (static tree image)
  • ML Library: Scikit-learn (regression trees)
  • Styling: Custom CSS with Google Fonts (Lora + Montserrat)
  • Tree Visualization: Custom network layout algorithm with recursive positioning
  • Performance: Optimized for large trees with dynamic depth calculation

Files

  • insurance_tree_app.py: Main application file
  • test_app.py: Test suite to verify functionality
  • requirements.txt: Python dependencies
  • resources/insurance.csv: Dataset
  • assets/full_tree_alpha_0.png: Static full tree image
  • regression_tree_analysis.py: Original analysis script
  • generate_full_tree_fast.py: Script to generate full tree visualization

Educational Value

This app demonstrates key machine learning concepts:

  • Overfitting vs Underfitting: See how alpha affects model complexity
  • Bias-Variance Tradeoff: Observe the performance curves
  • Feature Importance: Understand which features drive predictions
  • Model Interpretability: Explore the decision tree structure visually
  • Hyperparameter Tuning: Learn how to find optimal alpha values
  • Tree Pruning: Visual understanding of cost complexity pruning

Performance Notes

  • Interactive Tree: Limited to reasonable alpha values (50K-1M) for performance
  • Full Tree Image: Static image shows complete tree at alpha=0 for reference
  • Optimal Default: App starts at optimal alpha (249,500) for best user experience
  • Dynamic Layout: Tree visualization automatically adjusts spacing for different depths

Customization

You can easily modify the app to:

  • Use different datasets
  • Add more visualization types
  • Change the alpha range
  • Add additional model parameters
  • Customize the styling and layout
  • Modify the tree visualization algorithm

Troubleshooting

If you encounter issues:

  1. Run python test_app.py to check for problems
  2. Ensure all dependencies are installed: pip install -r requirements.txt
  3. Check that resources/insurance.csv is in the correct directory
  4. Verify Python version compatibility (3.7+)
  5. For deployment, ensure assets/ directory contains the full tree image

Deployment

The app is ready for deployment on:

  • Railway: Auto-detects Python and requirements.txt
  • Local: Direct Python execution

Future Enhancements

Potential improvements:

  • Add cross-validation for more robust performance estimates
  • Include other tree-based models (Random Forest, XGBoost)
  • Add model comparison features
  • Implement tree pruning visualization
  • Add export functionality for results
  • Interactive node selection and highlighting
  • Animation between different alpha values

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published