⚡️ Speed up function histogram_equalization
by 28,290%
#110
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 28,290% (282.90x) speedup for
histogram_equalization
insrc/numpy_pandas/signal_processing.py
⏱️ Runtime :
6.91 seconds
→24.3 milliseconds
(best of165
runs)📝 Explanation and details
The optimized code achieves a 284x speedup by replacing nested Python loops with vectorized NumPy operations, eliminating the primary performance bottlenecks.
Key optimizations:
Histogram computation: Replaced nested loops with
np.add.at(histogram, image.ravel(), 1)
- this single vectorized operation eliminates ~4 million loop iterations that were consuming 14.4% of runtime.Pixel mapping: Replaced the second set of nested loops with
np.round(cdf[image] * 255)
- uses advanced NumPy indexing to apply the CDF transformation to the entire image at once. This eliminates another ~4 million loop iterations that were consuming 79.2% of runtime.Memory allocation: Removed
np.zeros_like(image)
allocation since the result is computed directly from the CDF operation.Why this works so well:
cdf[image]
) efficiently broadcasts the CDF lookup across the entire arraynp.add.at
handles the histogram binning in a single pass without Python loop overheadTest case performance patterns:
The CDF calculation loop remains unchanged since it's only 256 iterations and represents minimal runtime impact.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-histogram_equalization-mfpyckbx
and push.