Skip to content

get_drawings's items is missing line from h path operator #3207

@Rodrigodd

Description

@Rodrigodd

Description of the bug

I have a PDF file with the following two squares, each one formed by two triangles:

q 1 0 0 1 300 100 cm
0 0 m
100 0 l
0 100 l
h
100 0 m
0 100 l
100 100 l
h
f
Q

q 1 0 0 1 100 100 cm
0 0 m
100 0 l
0 100 l
0 0 l
h
100 0 m
0 100 l
100 100 l
100 0 l
h
f
Q

The only difference in the second shape is the presence of a extra l operator before the h operator, making the h unnecessary. The h operator is described in 8.5.2.1 in PDF 3200:2008, and it should close the current subpath by appending a line.

When rendered, the two shapes are equal:

image

But when running Page.get_drawings in this document the items of the first drawing is missing the line draw by the h command:

{'closePath': True,
 'color': None,
 'dashes': None,
 'even_odd': False,
 'fill': (0.9176470041275024, 0.8980389833450317, 0.8235290050506592),
 'fill_opacity': 1.0,
 'items': [('l', Point(400.0, 200.0), Point(300.0, 200.0)),
           ('l', Point(300.0, 200.0), Point(400.0, 100.0)),
           ('l', Point(300.0, 200.0), Point(400.0, 100.0)),
           ('l', Point(400.0, 100.0), Point(300.0, 100.0))],
 'layer': '',
 'lineCap': None,
 'lineJoin': None,
 'rect': Rect(300.0, 100.0, 400.0, 200.0),
 'seqno': 0,
 'stroke_opacity': None,
 'type': 'f',
 'width': None}
{'closePath': True,
 'color': None,
 'dashes': None,
 'even_odd': False,
 'fill': (0.9176470041275024, 0.8980389833450317, 0.8235290050506592),
 'fill_opacity': 1.0,
 'items': [('l', Point(200.0, 200.0), Point(100.0, 200.0)),
           ('l', Point(100.0, 200.0), Point(200.0, 100.0)),
           ('l', Point(200.0, 100.0), Point(200.0, 200.0)),
           ('l', Point(100.0, 200.0), Point(200.0, 100.0)),
           ('l', Point(200.0, 100.0), Point(100.0, 100.0)),
           ('l', Point(100.0, 100.0), Point(100.0, 200.0))],
 'layer': '',
 'lineCap': None,
 'lineJoin': None,
 'rect': Rect(100.0, 100.0, 200.0, 200.0),
 'seqno': 1,
 'stroke_opacity': None,
 'type': 'f',
 'width': None}

How to reproduce the bug

Run the following script:

import fitz
import sys
from pprint import pprint

pdf_path = sys.argv[1]

pdf_document = fitz.open(pdf_path)

for page in pdf_document:
    for draw in page.get_drawings():
        pprint(draw)

with the following sample file:

testtriangles.pdf

I created the sample file by manually editing another file, and fixing the stream length with qpdf, so don't be surprise if there is anything wrong with it.

PyMuPDF version

1.23.25

Operating system

Windows

Python version

3.12

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions