-
Notifications
You must be signed in to change notification settings - Fork 665
Closed
Labels
fix developedrelease schedule to be determinedrelease schedule to be determinedupstream bugbug outside this packagebug outside this package
Description
Description of the bug
os: linux Ubuntu 22.04 LTS
python 3.10.2
When I upload a PDF file, the program hangs for several hours without exiting When using get_text() method.
How to reproduce the bug
>>> import fitz as pymupdf
>>> pdf_path = '/data/dataset/book/pdf/bad/e4a0626f933941c6db3257f5cea4f3e5.pdf'
>>> def parse_test(pdf_path) -> str:
... pymu_doc = pymupdf.open(pdf_path, filetype="pdf")
... contents = []
... try:
... if not pymu_doc:
... return contents
... for _, page in enumerate(pymu_doc):
... content = page.get_text("text")
... contents.append(content.replace('\n', ' '))
... except Exception:
... contents = []
... return '\n'.join(contents)
...
>>> a = parse_test(pdf_path)e4a0626f933941c6db3257f5cea4f3e5.pdf
PyMuPDF version
1.24.0
Operating system
Linux
Python version
3.10
Metadata
Metadata
Assignees
Labels
fix developedrelease schedule to be determinedrelease schedule to be determinedupstream bugbug outside this packagebug outside this package
