extractText() extracts broken text from pdf

### Description of the bug

Hi, 

I noticed a bug in PyMuPDF version > 1.23.9 (included) when using get_text to extract text from PDF documents.

To reproduce the bug 
- Consider the attached PDF file: [test_file.pdf](https://github.com/pymupdf/PyMuPDF/files/14343604/test_file.pdf)

- Extract text using the code below (see "How to reproduce the bug")

- To reproduce the correct behavior install a PyMuPDF version < 1.23.9 (e.g., 1.23.8). We obtain the following **complete** text: [doc_text_1238.txt](https://github.com/pymupdf/PyMuPDF/files/14343570/doc_text_1238.txt)

- To reproduce the bug behavior install a PyMuPDF version >= 1.23.9 (e.g., 1.23.24). We obtain the following **broken** text: [doc_text_12324.txt](https://github.com/pymupdf/PyMuPDF/files/14343674/doc_text_12324.txt)

**ADDITIONAL NOTES**
- The bug behavior can only be observed on certain documents (e.g., the one attached above)
- extractBLOCKS, extractWORDS and extractDICT work fine, the bug seems to show only for extractTEXT
- We tried both on windows and linux and neither works

Thank you for your help 

### How to reproduce the bug

Extract text using the following code
```python
fitz_doc = fitz.open(pdf_path)

doc_text = list()
for page in fitz_doc:
    doc_text.append(page.get_text())

doc_text = ' '.join(doc_text)
```

To reproduce the bug behavior install a PyMuPDF version >= 1.23.9 (e.g., 1.23.24).

### PyMuPDF version

1.23.24

### Operating system

Windows

### Python version

3.10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

extractText() extracts broken text from pdf #3186

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

extractText() extracts broken text from pdf #3186

Description

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions