Skip to content

Major Performance Regression: pix.color_count is 150x slower in version 1.25.3 compared to 1.23.8 #4336

@mehdi-jamaseb

Description

@mehdi-jamaseb

Description:

I have identified a severe performance regression in the pix.color_count function in PyMuPDF when comparing versions 1.23.8 and 1.25.3. The execution time has increased by approximately 100x over these versions.

PyMuPDF Version Execution Time for 100 Calls Performance Drop
1.23.8 0.225 seconds Baseline
1.23.26 24.41 seconds ~108x slower
1.25.3 33.7 seconds ~150x slower
  • In PyMuPDF 1.23.8, executing pix.color_count 100 times took 0.225 seconds.
  • In PyMuPDF 1.23.26, the same function took 24.41 seconds.
  • In PyMuPDF 1.25.3, the function took 33.7 seconds under identical conditions.

This regression severely impacts workflows that rely on pix.color_count, making it nearly unusable for high-performance image analysis tasks.


Steps to Reproduce:

1. Install PyMuPDF 1.23.8: 
   pip install pymupdf==1.23.8
  1. Run the following profiling script on a sample PDF file:

    import click
    import fitz  # PyMuPDF
    import cProfile
    import pstats
    import io
    

    def profile_function(func, iterations=100, *args, **kwargs):
    """Profiles a given function by running it multiple times and prints the stats."""
    pr = cProfile.Profile()
    pr.enable()

    for _ in range(iterations):
        func(*args, **kwargs)
    
    pr.disable()
    
    s = io.StringIO()
    ps = pstats.Stats(pr, stream=s).sort_stats("cumulative")
    ps.print_stats(10)  # Print top 10 time-consuming functions
    print(s.getvalue())
    

    @click.command()
    @click.argument("pdf_path", type=click.Path(exists=True, file_okay=True, dir_okay=False))
    def profile_color_count(pdf_path):
    """Loads a PDF, extracts the first page as an image, and counts colors 100 times using the given approach."""
    doc = fitz.open(pdf_path)
    p = doc[0]
    pix = p.get_pixmap(dpi=120)
    mat = p.rect.torect(pix.irect)
    rect = fitz.Rect(50.1, 50.1, 200.5, 200.3)

    print("Profiling pix.color_count (executed 100 times)...")
    profile_function(pix.color_count, iterations=100, colors=True, clip=rect * mat)
    

    if name == "main":
    profile_color_count()

  2. Run the script with a sample PDF:

    python script.py sample.pdf
    
  3. Record the execution time.

  4. Upgrade to PyMuPDF 1.23.26 and repeat the test:

    pip install pymupdf==1.23.26
    
  5. Upgrade to PyMuPDF 1.25.3 and repeat the test:

    pip install pymupdf==1.25.3
    
  6. Compare execution times across versions.

Observed Behavior:

  • PyMuPDF 1.23.8: pix.color_count runs 100 times in 0.225 seconds (very fast).
  • PyMuPDF 1.23.26: Execution time jumps to 24.41 seconds (~100x slower).
  • PyMuPDF 1.25.3: Execution time increases further to 33.7 seconds (~150x slower).

This suggests that a major change between 1.23.8 → 1.23.9 introduced a significant slowdown in pix.color_count.


Expected Behavior:

The pix.color_count function should maintain efficient performance across versions. A 100x performance drop severely impacts its usability in real-world applications.


System Information:

  • Operating System: Windows 10
  • Python Version: 3.12
  • CPU: AMD Ryzen 5 3600 (6-Core, 3.60 GHz)
  • RAM: 16 GB
  • PyMuPDF Versions Tested:
    • 1.23.8 → Fast (0.225 sec for 100 calls)
    • ⚠️ 1.23.26 → Slow (24.41 sec for 100 calls, ~100x slower)
    • 1.25.3 → Very slow (33.7 sec for 100 calls, ~150x slower)

Additional Information:

This drastic performance drop makes pix.color_count nearly unusable for batch image processing. If a specific change in 1.23.9 or later caused this regression, it would be helpful to investigate whether the previous approach can be optimized or restored.


📌 Thank you for your time and for maintaining PyMuPDF! 🚀

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions