⚡️ Speed up function `histogram_equalization` by 28,290% #110

codeflash-ai · 2025-09-18T21:58:18Z

📄 28,290% (282.90x) speedup for `histogram_equalization` in `src/numpy_pandas/signal_processing.py`

⏱️ Runtime : 6.91 seconds → 24.3 milliseconds (best of 165 runs)

📝 Explanation and details

The optimized code achieves a 284x speedup by replacing nested Python loops with vectorized NumPy operations, eliminating the primary performance bottlenecks.

Key optimizations:

Histogram computation: Replaced nested loops with np.add.at(histogram, image.ravel(), 1) - this single vectorized operation eliminates ~4 million loop iterations that were consuming 14.4% of runtime.
Pixel mapping: Replaced the second set of nested loops with np.round(cdf[image] * 255) - uses advanced NumPy indexing to apply the CDF transformation to the entire image at once. This eliminates another ~4 million loop iterations that were consuming 79.2% of runtime.
Memory allocation: Removed np.zeros_like(image) allocation since the result is computed directly from the CDF operation.

Why this works so well:

Python loops have significant per-iteration overhead (~500-600ns per iteration based on profiler data)
NumPy's vectorized operations run in optimized C code with minimal Python overhead
Advanced indexing (cdf[image]) efficiently broadcasts the CDF lookup across the entire array
np.add.at handles the histogram binning in a single pass without Python loop overhead

Test case performance patterns:

Small images (16 pixels): 24-69% faster due to reduced loop overhead
Medium images (1K pixels): 1600-4700% faster as vectorization benefits compound
Large images (1M pixels): 24000-34000% faster where the optimization truly shines

The CDF calculation loop remains unchanged since it's only 256 iterations and represents minimal runtime impact.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 32 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import numpy as np
# imports
import pytest  # used for our unit tests
from src.numpy_pandas.signal_processing import histogram_equalization

# ------------------- Unit Tests -------------------

# 1. Basic Test Cases

def test_uniform_image():
    # All pixels have the same value (e.g., 128)
    img = np.full((4, 4), 128, dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 94.5μs -> 65.1μs (45.3% faster)

def test_binary_image():
    # Image with only two values (0 and 255)
    img = np.array([[0, 0, 255, 255],
                    [0, 0, 255, 255],
                    [0, 0, 255, 255],
                    [0, 0, 255, 255]], dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 94.0μs -> 64.9μs (45.0% faster)
    # The two values should map to 127 and 255 (or close, depending on rounding)
    unique = np.unique(result)

def test_gradient_image():
    # Simple gradient image
    img = np.arange(16, dtype=np.uint8).reshape((4, 4))
    codeflash_output = histogram_equalization(img); result = codeflash_output # 93.4μs -> 64.6μs (44.6% faster)

def test_small_random_image():
    # Small random image, check shape and dtype
    rng = np.random.default_rng(42)
    img = rng.integers(0, 256, size=(3, 3), dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 80.5μs -> 64.9μs (24.1% faster)

# 2. Edge Test Cases

def test_all_zero_image():
    # All zeros
    img = np.zeros((5, 5), dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 110μs -> 65.6μs (68.0% faster)

def test_all_max_image():
    # All 255s
    img = np.full((5, 5), 255, dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 109μs -> 64.5μs (69.9% faster)

def test_single_pixel_image():
    # 1x1 image, value 42
    img = np.array([[42]], dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 63.5μs -> 65.0μs (2.31% slower)

def test_two_pixel_image_diff_values():
    # 1x2 image, values 10 and 200
    img = np.array([[10, 200]], dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 68.1μs -> 65.0μs (4.81% faster)
    # Lower value should map to 127/128, higher to 255
    vals = np.sort(result.flatten())

def test_image_with_missing_bins():
    # Image with only a few unique values (e.g., 0, 128, 255)
    img = np.array([[0, 0, 128, 255],
                    [0, 128, 128, 255],
                    [255, 255, 128, 0],
                    [128, 255, 0, 128]], dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 93.1μs -> 64.8μs (43.6% faster)
    # Only three unique output values
    unique = np.unique(result)

def test_non_square_image():
    # Non-square image
    img = np.arange(30, dtype=np.uint8).reshape((5, 6))
    codeflash_output = histogram_equalization(img); result = codeflash_output # 117μs -> 65.5μs (79.8% faster)

def test_image_with_large_gaps():
    # Image with values far apart (e.g., 0, 100, 200)
    img = np.array([[0, 0, 100, 200],
                    [0, 100, 100, 200],
                    [200, 200, 100, 0],
                    [100, 200, 0, 100]], dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 93.2μs -> 64.7μs (44.0% faster)
    unique = np.unique(result)

def test_image_with_non_uint8_dtype():
    # Should still work if input is e.g. np.int32 (values in 0-255)
    img = np.array([[0, 128], [255, 64]], dtype=np.int32)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 71.8μs -> 65.7μs (9.26% faster)



def test_large_uniform_image():
    # Large image, all pixels same value
    img = np.full((1000, 1000), 50, dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 1.75s -> 5.45ms (32070% faster)

def test_large_gradient_image():
    # Large image with a gradient
    img = np.tile(np.linspace(0, 255, 1000, dtype=np.uint8), (1000, 1))
    codeflash_output = histogram_equalization(img); result = codeflash_output # 1.71s -> 7.02ms (24227% faster)

def test_large_random_image():
    # Large random image
    rng = np.random.default_rng(123)
    img = rng.integers(0, 256, size=(999, 999), dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 1.71s -> 5.00ms (34005% faster)

def test_large_image_performance():
    # This test is to ensure function does not crash or hang on large input
    img = np.random.randint(0, 256, size=(999, 999), dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 1.71s -> 4.94ms (34398% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import numpy as np
# imports
import pytest  # used for our unit tests
from src.numpy_pandas.signal_processing import histogram_equalization

# unit tests

# 1. BASIC TEST CASES

def test_uniform_image():
    # All pixels are the same value; output should be all 255 (since CDF jumps to 1 at that value)
    img = np.full((4, 4), 128, dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 105μs -> 66.7μs (58.5% faster)

def test_two_level_image():
    # Image with two values, half 0, half 255
    img = np.array([[0, 0, 255, 255],
                    [0, 0, 255, 255],
                    [0, 0, 255, 255],
                    [0, 0, 255, 255]], dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 95.8μs -> 64.8μs (47.9% faster)

def test_linear_gradient():
    # 4x4 image with increasing values from 0 to 15
    img = np.arange(16, dtype=np.uint8).reshape((4,4))
    codeflash_output = histogram_equalization(img); result = codeflash_output # 93.7μs -> 64.7μs (44.8% faster)
    # All values should be spread out between 0 and 255
    expected = np.round(np.linspace(0, 255, 16)).astype(np.uint8).reshape((4,4))

def test_small_image():
    # 2x2 image with different values
    img = np.array([[0, 64],
                    [128, 255]], dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 72.9μs -> 65.0μs (12.2% faster)
    # Each value should be mapped to a quarter step
    expected = np.array([[ 63, 127],
                         [191, 255]], dtype=np.uint8)

# 2. EDGE TEST CASES


def test_single_pixel():
    # Single pixel image should return 255
    img = np.array([[42]], dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 68.2μs -> 66.7μs (2.31% faster)

def test_max_min_values():
    # Image with only 0 and 255
    img = np.array([[0, 255], [255, 0]], dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 71.4μs -> 65.2μs (9.46% faster)

def test_non_contiguous_values():
    # Image with values 0, 50, 200, 255
    img = np.array([[0, 50], [200, 255]], dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 70.5μs -> 64.9μs (8.54% faster)
    # Mapping: 0->63, 50->127, 200->191, 255->255
    expected = np.array([[ 63, 127], [191, 255]], dtype=np.uint8)

def test_all_possible_values():
    # 16x16 image with all values from 0 to 255
    img = np.arange(256, dtype=np.uint8).reshape((16,16))
    codeflash_output = histogram_equalization(img); result = codeflash_output # 523μs -> 66.6μs (685% faster)

def test_non_square_image():
    # Test with a 2x8 image
    img = np.arange(16, dtype=np.uint8).reshape((2,8))
    codeflash_output = histogram_equalization(img); result = codeflash_output # 93.8μs -> 64.4μs (45.5% faster)
    expected = np.round(np.linspace(0, 255, 16)).astype(np.uint8).reshape((2,8))

def test_dtype_preserved():
    # Output dtype should match input dtype
    img = np.arange(16, dtype=np.uint8).reshape((4,4))
    codeflash_output = histogram_equalization(img); result = codeflash_output # 93.9μs -> 64.8μs (45.0% faster)

def test_large_single_value():
    # Large image with all pixels the same
    img = np.full((100, 100), 77, dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 17.9ms -> 110μs (16204% faster)

# 3. LARGE SCALE TEST CASES

def test_large_gradient_image():
    # 32x32 image with gradient from 0 to 255
    img = np.linspace(0, 255, 1024).astype(np.uint8).reshape((32,32))
    codeflash_output = histogram_equalization(img); result = codeflash_output # 1.84ms -> 71.1μs (2489% faster)

def test_large_random_image():
    # 100x10 image with random values
    rng = np.random.default_rng(42)
    img = rng.integers(0, 256, size=(100,10), dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 1.81ms -> 69.3μs (2516% faster)
    # Output should have at least as many unique values as input (unless input is uniform)
    if len(np.unique(img)) > 1:
        pass

def test_performance_large_image():
    # 500x2 image with random values (to check performance and correctness)
    rng = np.random.default_rng(123)
    img = rng.integers(0, 256, size=(500,2), dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 1.87ms -> 68.8μs (2623% faster)

def test_large_image_all_one_value():
    # 1000x1 image with single value
    img = np.full((1000,1), 13, dtype=np.uint8)
    codeflash_output = histogram_equalization(img); result = codeflash_output # 1.96ms -> 72.1μs (2624% faster)

def test_large_image_two_values():
    # 500x2 image, half zeros, half 255s
    img = np.vstack([np.zeros((500,2), dtype=np.uint8), np.full((500,2), 255, dtype=np.uint8)])
    codeflash_output = histogram_equalization(img); result = codeflash_output # 3.72ms -> 77.0μs (4730% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from src.numpy_pandas.signal_processing import histogram_equalization

To edit these changes git checkout codeflash/optimize-histogram_equalization-mfpyckbx and push.

The optimized code achieves a **284x speedup** by replacing nested Python loops with vectorized NumPy operations, eliminating the primary performance bottlenecks. **Key optimizations:** 1. **Histogram computation**: Replaced nested loops with `np.add.at(histogram, image.ravel(), 1)` - this single vectorized operation eliminates ~4 million loop iterations that were consuming 14.4% of runtime. 2. **Pixel mapping**: Replaced the second set of nested loops with `np.round(cdf[image] * 255)` - uses advanced NumPy indexing to apply the CDF transformation to the entire image at once. This eliminates another ~4 million loop iterations that were consuming 79.2% of runtime. 3. **Memory allocation**: Removed `np.zeros_like(image)` allocation since the result is computed directly from the CDF operation. **Why this works so well:** - Python loops have significant per-iteration overhead (~500-600ns per iteration based on profiler data) - NumPy's vectorized operations run in optimized C code with minimal Python overhead - Advanced indexing (`cdf[image]`) efficiently broadcasts the CDF lookup across the entire array - `np.add.at` handles the histogram binning in a single pass without Python loop overhead **Test case performance patterns:** - Small images (16 pixels): 24-69% faster due to reduced loop overhead - Medium images (1K pixels): 1600-4700% faster as vectorization benefits compound - Large images (1M pixels): 24000-34000% faster where the optimization truly shines The CDF calculation loop remains unchanged since it's only 256 iterations and represents minimal runtime impact.

codeflash-ai bot requested a review from KRRT7 September 18, 2025 21:58

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Sep 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `histogram_equalization` by 28,290% #110

⚡️ Speed up function `histogram_equalization` by 28,290% #110

Uh oh!

codeflash-ai bot commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

⚡️ Speed up function histogram_equalization by 28,290% #110

Are you sure you want to change the base?

⚡️ Speed up function histogram_equalization by 28,290% #110

Uh oh!

Conversation

codeflash-ai bot commented Sep 18, 2025

📄 28,290% (282.90x) speedup for histogram_equalization in src/numpy_pandas/signal_processing.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

⚡️ Speed up function `histogram_equalization` by 28,290% #110

⚡️ Speed up function `histogram_equalization` by 28,290% #110

📄 28,290% (282.90x) speedup for `histogram_equalization` in `src/numpy_pandas/signal_processing.py`