Cannot decode using CP950 (and possibly others) due to the Python implementation differing from the Microsoft implementation

Microsoft's implementation of CP950 has a large portion of the user-defined characters area filled. As these characters are not universal to the encoding, Python's implementation does not accept them when encoding or decoding data, which can lead to encoding issues (this is apparent with `RTFDE` failing to decode some data using CP950.

A fix is to create the `encoding` submodule which handles a number of jobs related to encodings, including implementing the Microsoft versions should they be unavailable or differ from the Python implementation.

Version 0.42.0, currently available for preview on the next-release branch, will definitively have support for the plain text body using CP950, as I have implemented code to handle decoding and encoding. If I discover other encodings that differ from the Python implementation, I plan to add them to that release as well and will note them here. Most encodings added will likely only find use on significantly older MSG files, if they find any use at all. However, I decided it would be better to have support for them and not need it rather than not have support and end up needing it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Cannot decode using CP950 (and possibly others) due to the Python implementation differing from the Microsoft implementation #373

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Cannot decode using CP950 (and possibly others) due to the Python implementation differing from the Microsoft implementation #373

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions