Problems with unreadable characters

### Description of the bug

I have found a bug when using doc[0].get_text('words') in case there is unreadable character present.

There is a table in the PDF, and whenever the "cell" contains this special character, the character is not read via get_text('words') method, and the next word gets wrong bbox values, having coordinates of that nonreadable character instead of its own coordinates.

<img width="819" height="50" alt="Image" src="https://github.com/user-attachments/assets/01ee9b2c-b787-40ce-859b-69a8ebbff0be" />

In this example that strange symbol (vertical bar with x on top) is not read at all, and the -4.00 is assigned x0, y0, x1, y1 coordinates of that non-read character.

In cases where are several consecutive such characters then only \u200d (zero-length-joiner) is read (probably between such characters). In one case (se picture below) with four such characters, two \u200d's are read (as separate "words"), but probably because of multiple such characters, +200.00 gets correct coordinates.

<img width="767" height="45" alt="Image" src="https://github.com/user-attachments/assets/f6129c89-22f8-46a4-8255-e750a8a95c34" />

However, in all cases with one such character in the "cell" coordinates get mixed up for the following word. 
So far have noticed only with this particular character (it tends to come up several times in such files).

Using latest version of PyMuPDF.

P.S. Cannot share full file as it contains sensitive data (probably could try somehow to extract part of that PDF with problematic characters).

### How to reproduce the bug

Have a PDF with wrong character.

### PyMuPDF version

1.26.5

### Operating system

Windows

### Python version

3.11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems with unreadable characters #4716

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems with unreadable characters #4716

Description

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions