-
Notifications
You must be signed in to change notification settings - Fork 661
Description
Description:
I have identified a severe performance regression in the pix.color_count function in PyMuPDF when comparing versions 1.23.8 and 1.25.3. The execution time has increased by approximately 100x over these versions.
| PyMuPDF Version | Execution Time for 100 Calls | Performance Drop |
|---|---|---|
| 1.23.8 | 0.225 seconds | Baseline |
| 1.23.26 | 24.41 seconds | ~108x slower |
| 1.25.3 | 33.7 seconds | ~150x slower |
- In PyMuPDF 1.23.8, executing
pix.color_count100 times took 0.225 seconds. - In PyMuPDF 1.23.26, the same function took 24.41 seconds.
- In PyMuPDF 1.25.3, the function took 33.7 seconds under identical conditions.
This regression severely impacts workflows that rely on pix.color_count, making it nearly unusable for high-performance image analysis tasks.
Steps to Reproduce:
1. Install PyMuPDF 1.23.8:
pip install pymupdf==1.23.8
-
Run the following profiling script on a sample PDF file:
import click import fitz # PyMuPDF import cProfile import pstats import iodef profile_function(func, iterations=100, *args, **kwargs):
"""Profiles a given function by running it multiple times and prints the stats."""
pr = cProfile.Profile()
pr.enable()for _ in range(iterations): func(*args, **kwargs) pr.disable() s = io.StringIO() ps = pstats.Stats(pr, stream=s).sort_stats("cumulative") ps.print_stats(10) # Print top 10 time-consuming functions print(s.getvalue())@click.command()
@click.argument("pdf_path", type=click.Path(exists=True, file_okay=True, dir_okay=False))
def profile_color_count(pdf_path):
"""Loads a PDF, extracts the first page as an image, and counts colors 100 times using the given approach."""
doc = fitz.open(pdf_path)
p = doc[0]
pix = p.get_pixmap(dpi=120)
mat = p.rect.torect(pix.irect)
rect = fitz.Rect(50.1, 50.1, 200.5, 200.3)print("Profiling pix.color_count (executed 100 times)...") profile_function(pix.color_count, iterations=100, colors=True, clip=rect * mat)if name == "main":
profile_color_count()
-
Run the script with a sample PDF:
python script.py sample.pdf -
Record the execution time.
-
Upgrade to PyMuPDF 1.23.26 and repeat the test:
pip install pymupdf==1.23.26 -
Upgrade to PyMuPDF 1.25.3 and repeat the test:
pip install pymupdf==1.25.3 -
Compare execution times across versions.
Observed Behavior:
- PyMuPDF 1.23.8:
pix.color_countruns 100 times in 0.225 seconds (very fast). - PyMuPDF 1.23.26: Execution time jumps to 24.41 seconds (~100x slower).
- PyMuPDF 1.25.3: Execution time increases further to 33.7 seconds (~150x slower).
This suggests that a major change between 1.23.8 → 1.23.9 introduced a significant slowdown in pix.color_count.
Expected Behavior:
The pix.color_count function should maintain efficient performance across versions. A 100x performance drop severely impacts its usability in real-world applications.
System Information:
- Operating System: Windows 10
- Python Version: 3.12
- CPU: AMD Ryzen 5 3600 (6-Core, 3.60 GHz)
- RAM: 16 GB
- PyMuPDF Versions Tested:
- ✅ 1.23.8 → Fast (0.225 sec for 100 calls)
⚠️ 1.23.26 → Slow (24.41 sec for 100 calls, ~100x slower)- ❌ 1.25.3 → Very slow (33.7 sec for 100 calls, ~150x slower)
Additional Information:
This drastic performance drop makes pix.color_count nearly unusable for batch image processing. If a specific change in 1.23.9 or later caused this regression, it would be helpful to investigate whether the previous approach can be optimized or restored.
📌 Thank you for your time and for maintaining PyMuPDF! 🚀