Skip to content

Error on .find_tables() #3191

@JorjMcKie

Description

@JorjMcKie

Discussed in #3190

Originally posted by bjmvercelli February 20, 2024
Hello, hope you guys are doing great.

I'm getting an error in version 1.23.24 (latest) using find_tables() method, more specific on extract_text() call.

The following code was extracted from table.py (lines 606 and 607). The error happens when extract_words(chars) returns an empty array.

words = extractor.extract_words(chars)
rotation = words[0]["rotation"]  # rotation cannot change within a cell

I do not believe that there's a problem in extract_words(), but i do believe that's an edge case from my PDF and, if thats the case, we could fix it by validating the length of words:

words = extractor.extract_words(chars)
if len(words) == 0:
  return ""
rotation = words[0]["rotation"]  # rotation cannot change within a cell

You can reproduce here

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions