Skip to content

Set fallback encoding for message strings #95

@pudo

Description

@pudo

My proposal is to return a fallback encoding when a message does not specify any encoding; and to log a warning alongside. Here is the stack trace we're seeing:

File "/ingestors/ingestors/manager.py", line 120, in ingest_entity
    self.ingest(file_path, entity)
  File "/ingestors/ingestors/manager.py", line 133, in ingest
    self.delegate(ingestor_class, file_path, entity)
  File "/ingestors/ingestors/manager.py", line 147, in delegate
    ingestor_class(self).ingest(file_path, entity)
  File "/ingestors/ingestors/email/outlookmsg.py", line 37, in ingest
    msg = FieldMessage(file_path)
  File "/usr/local/lib/python3.7/dist-packages/extract_msg/message.py", line 87, in __init__
    self.header
  File "/usr/local/lib/python3.7/dist-packages/extract_msg/message.py", line 212, in header
    headerText = self._getStringStream('__substg1.0_007D')
  File "/usr/local/lib/python3.7/dist-packages/extract_msg/message.py", line 166, in _getStringStream
    return None if tmp is None else tmp.decode(self.stringEncoding)
  File "/usr/local/lib/python3.7/dist-packages/extract_msg/message.py", line 292, in stringEncoding
    raise Exception('Encoding property not found')
Exception: Encoding property not found

Unfortunately, I cannot share the files causing this (they are subject to an active journalistic investigation). But it is reasonable to assume that they might not originate from Microsoft Outlook (instead, e.g., an eDiscovery tool), or have been tampered with.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions