-
-
Notifications
You must be signed in to change notification settings - Fork 178
Open
Description
Bug Metadata
- Version of extract_msg: 0.54.0
- Your python version: Python 3.11.6
- How did you launch extract_msg?
- My command line or
- I used the extract_msg package
Describe the bug
For some .msg files, I'm getting UnicodeDecodeError: 'XXX' codec can't decode byte (...): illegal multibyte sequence
The example codecs that fails are
- windows-950
- shift_jis
- charmap
- gb2312
Traceback
File "src/doc_parser/loaders.py", line 183, in get_msg_content
with extract_msg.openMsg(path) as msg:
^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/extract_msg/open_msg.py", line 124, in openMsg
return Message(path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/extract_msg/msg_classes/message_base.py", line 83, in __init__
super().__init__(path, **kwargs)
File ".venv/lib/python3.11/site-packages/extract_msg/msg_classes/msg.py", line 221, in __init__
self.attachments
File "/Users/kacperwlodarczyk/.local/share/uv/python/cpython-3.11.6-macos-aarch64-none/lib/python3.11/functools.py", line 1001, in __get__
val = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/extract_msg/msg_classes/msg.py", line 862, in attachments
attachments.append(self.initAttachmentFunc(self, attachmentDir))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/extract_msg/attachments/__init__.py", line 108, in initStandardAttachment
return EmbeddedMsgAttachment(msg, dir_, propStore)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/extract_msg/attachments/emb_msg_att.py", line 38, in __init__
self.__data = openMsg(self.msg.path, prefix = self.__prefix, parentMsg = self.msg, treePath = self.treePath, **self.msg.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/extract_msg/open_msg.py", line 90, in openMsg
msg = MSGFile(path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/extract_msg/msg_classes/msg.py", line 206, in __init__
filename = self.getStringStream(prefixl[:-1] + ['__substg1.0_3001'], prefix = False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.11/site-packages/extract_msg/msg_classes/msg.py", line 738, in getStringStream
return None if tmp is None else tmp.decode(self.stringEncoding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'shift_jis' codec can't decode byte 0xfc in position 74: illegal multibyte sequence
Metadata
Metadata
Assignees
Labels
No labels