-
Notifications
You must be signed in to change notification settings - Fork 662
Closed
Labels
Description
Description of the bug
Page.get_text results in AssertionError for all options except "blocks" and "words" in epub files. However, directly accessing the methods from TextPage works fine.
This is there only in 1.24.7 I think. My distribution package of 1.23.7 does not cause this error.
How to reproduce the bug
- Download an epub file, I was able to reproduce the bug with https://www.gutenberg.org/ebooks/73987 for context.
- Run the following code.
import pymupdf
doc = pymupdf.open("/home/arun-mani-j/Downloads/test.epub")
p = doc[0]
p.get_text("text")
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
----> 1 p.get_text("text")
~/Projects/aayra/lib/python3.12/site-packages/pymupdf/utils.py in ?(page, option, clip, flags, textpage, sort, delimiters)
794 if clip is not None:
795 clip = pymupdf.Rect(clip)
796 cb = None
797 elif type(page) is pymupdf.Page:
--> 798 cb = page.cropbox
799 # pymupdf.TextPage with or without images
800 tp = textpage
801 #pymupdf.exception_info()
~/Projects/aayra/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self)
8531 @property
8532 def cropbox(self):
8533 """The CropBox."""
8534 CheckParent(self)
-> 8535 page = self._pdf_page()
8536 if not page.m_internal:
8537 val = mupdf.fz_bound_page(self.this)
8538 else:
~/Projects/aayra/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self)
8050 def _pdf_page(self):
-> 8051 return _as_pdf_page(self.this)
~/Projects/aayra/lib/python3.12/site-packages/pymupdf/__init__.py in ?(page, required)
333 return page
334 elif isinstance(page, mupdf.FzPage):
335 ret = mupdf.pdf_page_from_fz_page(page)
336 if required:
--> 337 assert ret.m_internal
338 return ret
339 elif page is None:
340 assert 0, f'page is None'
AssertionError: - Using
TextPagemethods directly works fine.
tp = p.get_textpage()
tp.extractText() # No errors raised- Using
"words"or"blocks"work fine.
p.get_text("words")
p.get_text("blocks")PyMuPDF version
1.24.7
Operating system
Linux
Python version
3.12