Skip to content

Cannot decode using CP950 (and possibly others) due to the Python implementation differing from the Microsoft implementation #373

@TheElementalOfDestruction

Description

Microsoft's implementation of CP950 has a large portion of the user-defined characters area filled. As these characters are not universal to the encoding, Python's implementation does not accept them when encoding or decoding data, which can lead to encoding issues (this is apparent with RTFDE failing to decode some data using CP950.

A fix is to create the encoding submodule which handles a number of jobs related to encodings, including implementing the Microsoft versions should they be unavailable or differ from the Python implementation.

Version 0.42.0, currently available for preview on the next-release branch, will definitively have support for the plain text body using CP950, as I have implemented code to handle decoding and encoding. If I discover other encodings that differ from the Python implementation, I plan to add them to that release as well and will note them here. Most encodings added will likely only find use on significantly older MSG files, if they find any use at all. However, I decided it would be better to have support for them and not need it rather than not have support and end up needing it.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions