Skip to content

Commit 732414b

Browse files
Merge pull request #397 from TeamMsgExtractor/next-release
Version 0.47.0
2 parents 9d8ea67 + ffef919 commit 732414b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

84 files changed

+6881
-5184
lines changed

CHANGELOG.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,45 @@
1+
**v0.47.0**
2+
* Changed the public API for `PropertiesStore` to improve the quality of its code. The properties type is now mandatory, and the intelligence field (and the related enum) has been removed.
3+
* Additionally, the `toBytes` and `__bytes__` methods will both generate based on the contents of this class, allowing for new properties to be created and for existing properties to be modified or removed if the class is set to writable on creation.
4+
* Added new method `Named.getPropNameByStreamID`. This method takes the ID of a stream (or property stream entry) and returns the property name (as a tuple of the property name/ID and the property set) that is stored there. Returns `None` if the stream is not used to store a named property. This name can be directly used (if it is not `None`) to get the `NamedPropertyBase` instance associated. This method is most useful for people looking at the raw data of a stream and trying to figure out what named property it refers to.
5+
* Fixed mistake in struct definitions that caused a float to require 8 bytes to unpack.
6+
* Added tests for `extract_msg.properties.props`.
7+
* Added basic tests for `extract_msg.attachments`.
8+
* Added validation tests for the `enums` and `constants` submodule.
9+
* Removed unneeded structs.
10+
* Fixed an issue where `PtypGuid` was being parsed by the wrong property type. Despite having a fixed size, it is still a variable length property.
11+
* Fixed *all* of the setters not working. I didn't know they needed to use the same name as the getter, and I *swear* they were working at some point with the current setup. Some sources online suggested the original form should work, which is even stranger.
12+
* Unified all line endings to LF instead of a mix of CRLF and LF.
13+
* Changed enums `BCTextFormat` and `BCLabelFormat` to `IntFlag` enums. The values that exist are for the individual flags and not the groups of flags.
14+
* Made `FieldInfo` writable, however it can no longer be directly converted to bytes since it requires additional information outside of itself to convert to bytes. It still retains a `toBytes` method, however it requires an argument for the additional data.
15+
* Fixed `UnsupportedAttachment` inverting the `skipNotImplemented` keyword argument.
16+
* Fixed `DMPaperSize` not being a subclass of `Enum`.
17+
* Extended values for `DMPaperSize`.
18+
* Removed unneeded structs.
19+
* Fixed exports for `extract_msg.constants.st`.
20+
* Updated various parts of the documentation to improve it and make it more consistent.
21+
* Fixed `Recipient.existsTypedProperty` and `AttachmentBase.existsTypedProperty` having the wrong return type.
22+
* Removed "TODO" markers on `OleStreamStruct` and finalized it to only handle the OLEStream for embedded objects.
23+
* Fixed type annotations for `extract_msg.utils.fromTimeStamp`.
24+
* Added new function `extract_msg.properties.prop.createNewProp`. This function allows a new property to be created with default data based on the name. The name MUST be an 8 character hex string, the first 4 characters being the value for the property ID and the second 4 being the value for the type.
25+
* Fixed `VariableLengthProp.reservedFlags` returning the wrong type.
26+
* Adjusted many of the property structs to make it easier to use them for packing.
27+
* Renamed some of the property structs to be more informative.
28+
* Fixed `ServerID` using the wrong struct (would have thrown an exception for not having enough data).
29+
* Unified signing for integer properties. All integer properties are considered unsigned by default when read directly from the MSG file, however specific property accessors may convert to signed if the property is specifically intended to be so. This does not necessarily include properties that are *stored* as integers but have a value that is not an integer, like `PtypCurrency`.
30+
* Changed `extract_msg.constants.NULL_DATE` to be a subclass of `datetime.datetime` instead of just a fixed value. Functions that return null dates may end up returning distinct versions of `NullDate` (the newly created subclass), however all instances of `NullDate` will register as being equal to each other. Existing equality comparisons to the `NULL_DATE` constant will all function as intended, however `is` checks may fail for some null dates.
31+
* Changed currency type to return a `Decimal` instance for higher precision.
32+
* Made `FixedLengthProp` and `VariableLengthProp` writable. Only the property `flags` from `PropBase` is writable. This also includes the ability to convert them to bytes based on their value.
33+
* Fixed many issues in `VariableLengthProp` regarding it's calculation of sizes.
34+
* Removed `hasLen` function.
35+
* Changed style to remove space between variable name and colon that separates the type.
36+
* Corrected `InvaildPropertyIdError` to `InvalidPropertyIdError`.
37+
* Updated to `olefile` version 0.47.
38+
* Updated `RTFDE` minimum version to 0.1.1.
39+
* Changed dependency named for `compressed-rtf` to remove minor typo (it *should* forward to the correct place regardless, but just to be safe).
40+
* Fixed issues with implementation of `OleWriter` when it comes to large sets of data. Most issues would fail silently and only be noticeable when trying to open the file. If your file is less than 2 GB, you would likely not have notices and issues at all. This includes adding a new exception that is throw from the `write` method, `TooManySectorsError`.
41+
* Fixed some issues with `OleWriter` that could cause entries to end up partially added if the data was an issue.
42+
143
**v0.46.2**
244
* Adjusted typing information on regular expressions. They were using a subscript that was added in Python 3.9 (apparently that is something the type checker doesn't check for), which made the module incompatible with Python 3.8. If you are using Python 3.9 or higher a version check will switch to the more specific typing.
345

README.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -259,11 +259,11 @@ your access to the newest major version of extract-msg.
259259
.. |License: GPL v3| image:: https://img.shields.io/badge/License-GPLv3-blue.svg
260260
:target: LICENSE.txt
261261

262-
.. |PyPI3| image:: https://img.shields.io/badge/pypi-0.46.2-blue.svg
263-
:target: https://pypi.org/project/extract-msg/0.46.2/
262+
.. |PyPI3| image:: https://img.shields.io/badge/pypi-0.47.0-blue.svg
263+
:target: https://pypi.org/project/extract-msg/0.47.0/
264264

265265
.. |PyPI2| image:: https://img.shields.io/badge/python-3.8+-brightgreen.svg
266-
:target: https://www.python.org/downloads/release/python-3816/
266+
:target: https://www.python.org/downloads/release/python-3810/
267267

268268
.. |Read the Docs| image:: https://readthedocs.org/projects/msg-extractor/badge/?version=latest
269269
:target: https://msg-extractor.readthedocs.io/en/stable/?badge=latest

docs/_gen.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ class Package(NamedTuple):
1515
"""
1616
A class representing one of the subpackages of a module.
1717
"""
18-
modules : List[str]
19-
packages : List[str]
18+
modules: List[str]
19+
packages: List[str]
2020

2121

2222

@@ -65,7 +65,7 @@ def _readProject(root) -> Dict[str, Dict[str, bool]]:
6565
return ret
6666

6767

68-
def _makePackage(name : str, data : Dict[str, bool]) -> Package:
68+
def _makePackage(name: str, data: Dict[str, bool]) -> Package:
6969
return Package([f'{name}.{x}' for x in data if not data[x]], [f'{name}.{x}' for x in data if data[x]])
7070

7171

@@ -79,7 +79,7 @@ def run():
7979
writeAutoGenerated((x + '.rst' for x in project))
8080

8181

82-
def generateFile(name : str, package : Package):
82+
def generateFile(name: str, package: Package):
8383
with open(DIRECTORY / (name + '.rst'), 'w') as f:
8484
# Header.
8585
temp = name.replace('_', '\\_') + ' package'
@@ -125,10 +125,10 @@ def readProject(root) -> Dict[str, Package]:
125125
Returns a dictionary of package names to Package instances for a project.
126126
"""
127127
initialRead = _readProject(root)
128-
return {x : _makePackage(x, y) for x, y in initialRead.items()}
128+
return {x: _makePackage(x, y) for x, y in initialRead.items()}
129129

130130

131-
def writeAutoGenerated(files : List[str]) -> None:
131+
def writeAutoGenerated(files: List[str]) -> None:
132132
"""
133133
Writes the _autogen.txt file.
134134
"""

extract_msg/__init__.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@
2727
# along with this program. If not, see <http://www.gnu.org/licenses/>.
2828

2929
__author__ = 'Destiny Peterson & Matthew Walker'
30-
__date__ = '2023-11-11'
31-
__version__ = '0.46.2'
30+
__date__ = '2023-12-09'
31+
__version__ = '0.47.0'
3232

3333
__all__ = [
3434
# Modules:
@@ -37,6 +37,7 @@
3737
'enums',
3838
'exceptions',
3939
'msg_classes',
40+
'null_date',
4041
'properties',
4142
'structures',
4243

@@ -61,7 +62,7 @@
6162
# Ensure these are imported before anything else.
6263
from . import constants, enums, exceptions
6364

64-
from . import attachments, msg_classes, properties, structures
65+
from . import attachments, msg_classes, null_date, properties, structures
6566
from .attachments import Attachment, AttachmentBase, SignedAttachment
6667
from .msg_classes import Message, MSGFile
6768
from .ole_writer import OleWriter

extract_msg/_rtf/create_doc.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010

1111

12-
def createDocument(tokens : Iterable[Token]) -> bytes:
12+
def createDocument(tokens: Iterable[Token]) -> bytes:
1313
"""
1414
Combines the tokenized data into bytes and returns the document.
1515
"""

extract_msg/_rtf/inject_rtf.py

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -47,11 +47,13 @@
4747
)
4848

4949

50-
def _listInsertMult(dest : List[_T], source : Iterable[_T], index : int = -1):
50+
def _listInsertMult(dest: List[_T], source: Iterable[_T], index: int = -1):
5151
"""
5252
Inserts into :param dest: all the items in :param source: at the index
53-
specified. :param dest: can be any mutable sequence with :method insert:,
54-
:method __len__:, and :method extend:.
53+
specified.
54+
55+
:param dest: Any mutable sequence with methods ``insert``, ``__len__``, and
56+
``extend``.
5557
5658
If :param index: is not specified, the default position is the end of the
5759
list. This is also where things will be inserted if index is greater than or
@@ -64,33 +66,35 @@ def _listInsertMult(dest : List[_T], source : Iterable[_T], index : int = -1):
6466
dest.insert(index + offset, item)
6567

6668

67-
def injectStartRTF(document : bytes, injectTokens : Union[bytes, List[Token]]) -> List[Token]:
69+
def injectStartRTF(document: bytes, injectTokens: Union[bytes, List[Token]]) -> List[Token]:
6870
"""
6971
Injects the specified tokens into the document, returning a new copy of the
70-
document as a list of Tokens. Injects the data just before the first
71-
rendered character.
72+
document as a list of Tokens.
73+
74+
Injects the data just before the first rendered character.
7275
7376
:param document: The bytes representing the RTF document.
7477
:param injectTokens: The tokens to inject into the document. Can either be
75-
a list of Tokens or bytes to be tokenized.
78+
a list of ``Token``\\s or ``bytes`` to be tokenized.
7679
7780
:raises TypeError: The data is not recognized as RTF.
7881
:raises ValueError: An issue with basic parsing occured.
7982
"""
8083
return injectStartRTFTokenized(tokenizeRTF(document), injectTokens)
8184

8285

83-
def injectStartRTFTokenized(document : List[Token], injectTokens : Union[bytes, Iterable[Token]]) -> List[Token]:
86+
def injectStartRTFTokenized(document: List[Token], injectTokens: Union[bytes, Iterable[Token]]) -> List[Token]:
8487
"""
8588
Like :function injectStartRTF:, injects the specified tokens into the
8689
document, returning a reference to the document, except that it accepts a
87-
document in the form of a list of tokens. Injects the data just before the
88-
first rendered character.
90+
document in the form of a list of tokens.
91+
92+
Injects the data just before the first rendered character.
8993
9094
:param document: The list of tokens representing the RTF document. Will only
9195
be modified if the function is successful.
9296
:param injectTokens: The tokens to inject into the document. Can either be
93-
a list of Tokens or bytes to be tokenized.
97+
a list of ``Token``\\s or ``bytes`` to be tokenized.
9498
9599
:raises TypeError: The data is not recognized as RTF.
96100
:raises ValueError: An issue with basic parsing occured.

extract_msg/_rtf/token.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,12 +24,12 @@ class TokenType(enum.Enum):
2424

2525
class Token(NamedTuple):
2626
# The raw bytes for the token, used to recreate the document.
27-
raw : bytes
27+
raw: bytes
2828
# The type of the token.
29-
type : TokenType
29+
type: TokenType
3030
## The following are optional as they only apply for certain types of tokens.
3131
# The name of the token, if it is a control or destination.
32-
name : Optional[bytes] = None
32+
name: Optional[bytes] = None
3333
# The parameter of the token, if it has one. If the token is a `\'hh` token,
3434
# this will be the decimal equivelent of the hex value.
35-
parameter : Optional[int] = None
35+
parameter: Optional[int] = None

extract_msg/_rtf/tokenize_rtf.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -51,11 +51,13 @@
5151
)
5252

5353

54-
def _finishTag(startText : bytes, reader : io.BytesIO) -> Tuple[bytes, Optional[bytes], Optional[int], bytes]:
54+
def _finishTag(startText: bytes, reader: io.BytesIO) -> Tuple[bytes, Optional[bytes], Optional[int], bytes]:
5555
"""
5656
Finishes reading a tag, returning the needed parameters to make it a
57-
token. The return is a 4 tuple of the raw token bytes, the name field,
58-
the parameter field (as an int), and the next character after the tag.
57+
token.
58+
59+
The return is a 4 tuple of the raw token bytes, the name field, the
60+
parameter field (as an int), and the next character after the tag.
5961
"""
6062
# Very simple rules here. Anything other than a letter and we change
6163
# state. If the next character is a hypen, check if the character after
@@ -99,7 +101,7 @@ def _finishTag(startText : bytes, reader : io.BytesIO) -> Tuple[bytes, Optional[
99101
return startText, name, param, nextChar
100102

101103

102-
def _readControl(startChar : bytes, reader : io.BytesIO) -> Tuple[Tuple[Token, ...], bytes]:
104+
def _readControl(startChar: bytes, reader: io.BytesIO) -> Tuple[Tuple[Token, ...], bytes]:
103105
"""
104106
Attempts to read the next data as a control, returning as many tokens
105107
as necessary.
@@ -163,7 +165,7 @@ def _readControl(startChar : bytes, reader : io.BytesIO) -> Tuple[Tuple[Token, .
163165
return (Token(startChar, TokenType.SYMBOL),), reader.read(1)
164166

165167

166-
def _readText(startChar : bytes, reader : io.BytesIO) -> Tuple[Tuple[Token, ...], bytes]:
168+
def _readText(startChar: bytes, reader: io.BytesIO) -> Tuple[Tuple[Token, ...], bytes]:
167169
"""
168170
Attempts to read the next data as text.
169171
"""
@@ -182,17 +184,15 @@ def _readText(startChar : bytes, reader : io.BytesIO) -> Tuple[Tuple[Token, ...]
182184
return tuple(Token(x, TokenType.TEXT) for x in chars), nextChar
183185

184186

185-
def tokenizeRTF(data : bytes, validateStart : bool = True) -> List[Token]:
187+
def tokenizeRTF(data: bytes, validateStart: bool = True) -> List[Token]:
186188
"""
187189
Reads in the bytes and sets the tokens list to the contents after
188-
tokenizing. If tokenizing fails, the current tokens list will not be
189-
changed.
190+
tokenizing.
190191
191-
Direct references to the previous tokens list will only point to the
192-
previous and not to the current one.
192+
If tokenizing fails, the current tokens list will not be changed.
193193
194-
:param validateStart: If False, does not check the first few tags. Useful
195-
when tokenizing a snippet rather than a document.
194+
:param validateStart: If ``False``, does not check the first few tags.
195+
Useful when tokenizing a snippet rather than a document.
196196
197197
:raises TypeError: The data is not recognized as RTF.
198198
:raises ValueError: An issue with basic parsing occured.

0 commit comments

Comments
 (0)