page.getText('html') returns empty string

## Describe the bug (mandatory)
`page.getText('html')` is returning an empty string for some files. Interestingly, `page.getText('text')` returns content so it is unclear why it is failing.

## To Reproduce (mandatory)
Code:
```python
import fitz  # import pymupdf by importing fitz
from io import BytesIO
import requests


# Working file
# url =  'https://miraiz.chuden.co.jp/home/electric/contract/fuelcost/unitprice/__icsFiles/afieldfile/2020/09/30/nen_price_202011.pdf'

# Broken file
# url = 'https://miraiz.chuden.co.jp/home/electric/contract/fuelcost/unitprice/__icsFiles/afieldfile/2020/06/29/nen_price_202008.pdf'

res = requests.request('get', url)
data = BytesIO(res.content)
doc = fitz.open(stream=data, filetype="pdf")
page = doc[0]
text = page.getText('text')
html = page.getText('html')
```
When using the url tagged `# Working file` everything works fine. When using the url tagged `# Broken file` html is empty while text has content.

## Expected behavior (optional)
I should have gotten the file converted to a html format, or if there is an issue parsing some sort of error message.

## Your configuration (mandatory)
 - Ubuntu 20.04.1 LTS 
 - Python 3.8.5
 - PyMuPDF version 1.18.3 installed via pip


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

page.getText('html') returns empty string #726

Describe the bug (mandatory)

To Reproduce (mandatory)

Expected behavior (optional)

Your configuration (mandatory)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

page.getText('html') returns empty string #726

Description

Describe the bug (mandatory)

To Reproduce (mandatory)

Expected behavior (optional)

Your configuration (mandatory)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions