-
Notifications
You must be signed in to change notification settings - Fork 665
Description
Please provide all mandatory information!
Describe the bug (mandatory)
This bug was fixed in 1.14.15 version, but for some reason new fitz version has similar issue. This is same problem that appeared in the past issue: #290
To Reproduce (mandatory)
I used same script from issue #290
Tested 2 different versions: 1.16.1 (newest) and wheel PyMuPDF-1.14.15-cp27-cp27mu-manylinux1_x86_64.whl from this page https://github.com/pymupdf/PyMuPDF/releases/tag/1.14.15
import os
import psutil
import gc
file_path = 'any_pdf_document.pdf'
# Loop
niter = 300
memory_usage_start = [0] * niter
memory_usage_before_gc = [0] * niter
memory_usage_stop = [0] * niter
for i in range(niter):
# Record initial memory usage
process = psutil.Process(os.getpid())
memory_usage_start[i] = process.memory_info().rss / 2 ** 20
# Load file
doc = fitz.open(file_path)
page_count = doc.pageCount
page = 0
text = ''
# Extract text
while (page < page_count):
p = doc.loadPage(page)
page += 1
# words = p.getTextWords()
rawdict = p.getText('rawDict')
# text += p.getText()
# Record memory usage before cleanup
memory_usage_before_gc[i] = process.memory_info().rss / 2 ** 20
# Cleanup attempt
# del text
doc.close()
gc.collect()
# Record mem final usage
memory_usage_stop[i] = process.memory_info().rss / 2 ** 20
# Viz mem usage VS iterations
from pylab import *
from matplotlib import pyplot as plt
iteration = list(range(0, niter))
plot(iteration[20:], memory_usage_start[20:])
plot(iteration[20:], memory_usage_before_gc[20:], 'r+')
plot(iteration[20:], memory_usage_stop[20:], 'g')
xlabel('iteration')
ylabel('memory usage (Mb)')
title('Memory usage over getText("rawDict")')
grid(True)
plt.savefig('memory_graph.png')
Expected behavior (optional)
Graphs for getText() for both versions look similar. But when I used getText('rawDict') different versions gave different memory graph. It seems fix that was made on 1.14.15 does not exists in newest versions.
Screenshots (optional)
Attached 4 screenshots
wheels 1.14.15 using getText()

wheels 1.14.15 using getText('rawDict')

1.16.1 using getText('rawDict')

Your configuration (mandatory)
- Ubuntu 16.04 x64
- python 2.7.12
- PyMuPDF version (1.14.15 from wheel PyMuPDF-1.14.15-cp27-cp27mu-manylinux1_x86_64.whl), and 1.16.1 from pip install.
Additional context (optional)
As I understand memory usage graphs should look similar for both versions, but it seems getText('rawDict') consumes much more memory than it should.
How long it would take to release new version with this bug fixed?
