TeamMsgExtractor
diff --git a/‎CHANGELOG.md‎
Lines changed: 42 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 42 additions & 0 deletions
diff --git a/‎README.rst‎
Lines changed: 3 additions & 3 deletions b/‎README.rst‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/_gen.py‎
Lines changed: 6 additions & 6 deletions b/‎docs/_gen.py‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎extract_msg/__init__.py‎
Lines changed: 4 additions & 3 deletions b/‎extract_msg/__init__.py‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎extract_msg/_rtf/create_doc.py‎
Lines changed: 1 addition & 1 deletion b/‎extract_msg/_rtf/create_doc.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎extract_msg/_rtf/inject_rtf.py‎
Lines changed: 15 additions & 11 deletions b/‎extract_msg/_rtf/inject_rtf.py‎
Lines changed: 15 additions & 11 deletions
diff --git a/‎extract_msg/_rtf/token.py‎
Lines changed: 4 additions & 4 deletions b/‎extract_msg/_rtf/token.py‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎extract_msg/_rtf/tokenize_rtf.py‎
Lines changed: 12 additions & 12 deletions b/‎extract_msg/_rtf/tokenize_rtf.py‎
Lines changed: 12 additions & 12 deletions
@@ -1,3 +1,45 @@
+**v0.47.0**
+* Changed the public API for `PropertiesStore` to improve the quality of its code. The properties type is now mandatory, and the intelligence field (and the related enum) has been removed.
+    * Additionally, the `toBytes` and `__bytes__` methods will both generate based on the contents of this class, allowing for new properties to be created and for existing properties to be modified or removed if the class is set to writable on creation.
+* Added new method `Named.getPropNameByStreamID`. This method takes the ID of a stream (or property stream entry) and returns the property name (as a tuple of the property name/ID and the property set) that is stored there. Returns `None` if the stream is not used to store a named property. This name can be directly used (if it is not `None`) to get the `NamedPropertyBase` instance associated. This method is most useful for people looking at the raw data of a stream and trying to figure out what named property it refers to.
+* Fixed mistake in struct definitions that caused a float to require 8 bytes to unpack.
+* Added tests for `extract_msg.properties.props`.
+* Added basic tests for `extract_msg.attachments`.
+* Added validation tests for the `enums` and `constants` submodule.
+* Removed unneeded structs.
+* Fixed an issue where `PtypGuid` was being parsed by the wrong property type. Despite having a fixed size, it is still a variable length property.
+* Fixed *all* of the setters not working. I didn't know they needed to use the same name as the getter, and I *swear* they were working at some point with the current setup. Some sources online suggested the original form should work, which is even stranger.
+* Unified all line endings to LF instead of a mix of CRLF and LF.
+* Changed enums `BCTextFormat` and `BCLabelFormat` to `IntFlag` enums. The values that exist are for the individual flags and not the groups of flags.
+* Made `FieldInfo` writable, however it can no longer be directly converted to bytes since it requires additional information outside of itself to convert to bytes. It still retains a `toBytes` method, however it requires an argument for the additional data.
+* Fixed `UnsupportedAttachment` inverting the `skipNotImplemented` keyword argument.
+* Fixed `DMPaperSize` not being a subclass of `Enum`.
+* Extended values for `DMPaperSize`.
+* Removed unneeded structs.
+* Fixed exports for `extract_msg.constants.st`.
+* Updated various parts of the documentation to improve it and make it more consistent.
+* Fixed `Recipient.existsTypedProperty` and `AttachmentBase.existsTypedProperty` having the wrong return type.
+* Removed "TODO" markers on `OleStreamStruct` and finalized it to only handle the OLEStream for embedded objects.
+* Fixed type annotations for `extract_msg.utils.fromTimeStamp`.
+* Added new function `extract_msg.properties.prop.createNewProp`. This function allows a new property to be created with default data based on the name. The name MUST be an 8 character hex string, the first 4 characters being the value for the property ID and the second 4 being the value for the type.
+* Fixed `VariableLengthProp.reservedFlags` returning the wrong type.
+* Adjusted many of the property structs to make it easier to use them for packing.
+* Renamed some of the property structs to be more informative.
+* Fixed `ServerID` using the wrong struct (would have thrown an exception for not having enough data).
+* Unified signing for integer properties. All integer properties are considered unsigned by default when read directly from the MSG file, however specific property accessors may convert to signed if the property is specifically intended to be so. This does not necessarily include properties that are *stored* as integers but have a value that is not an integer, like `PtypCurrency`.
+* Changed `extract_msg.constants.NULL_DATE` to be a subclass of `datetime.datetime` instead of just a fixed value. Functions that return null dates may end up returning distinct versions of `NullDate` (the newly created subclass), however all instances of `NullDate` will register as being equal to each other. Existing equality comparisons to the `NULL_DATE` constant will all function as intended, however `is` checks may fail for some null dates.
+* Changed currency type to return a `Decimal` instance for higher precision.
+* Made `FixedLengthProp` and `VariableLengthProp` writable. Only the property `flags` from `PropBase` is writable. This also includes the ability to convert them to bytes based on their value.
+* Fixed many issues in `VariableLengthProp` regarding it's calculation of sizes.
+* Removed `hasLen` function.
+* Changed style to remove space between variable name and colon that separates the type.
+* Corrected `InvaildPropertyIdError` to `InvalidPropertyIdError`.
+* Updated to `olefile` version 0.47.
+* Updated `RTFDE` minimum version to 0.1.1.
+* Changed dependency named for `compressed-rtf` to remove minor typo (it *should* forward to the correct place regardless, but just to be safe).
+* Fixed issues with implementation of `OleWriter` when it comes to large sets of data. Most issues would fail silently and only be noticeable when trying to open the file. If your file is less than 2 GB, you would likely not have notices and issues at all. This includes adding a new exception that is throw from the `write` method, `TooManySectorsError`.
+* Fixed some issues with `OleWriter` that could cause entries to end up partially added if the data was an issue.
+
 **v0.46.2**
 * Adjusted typing information on regular expressions. They were using a subscript that was added in Python 3.9 (apparently that is something the type checker doesn't check for), which made the module incompatible with Python 3.8. If you are using Python 3.9 or higher a version check will switch to the more specific typing.
 
 
@@ -259,11 +259,11 @@ your access to the newest major version of extract-msg.
 .. |License: GPL v3| image:: https://img.shields.io/badge/License-GPLv3-blue.svg
    :target: LICENSE.txt
 
-.. |PyPI3| image:: https://img.shields.io/badge/pypi-0.46.2-blue.svg
-   :target: https://pypi.org/project/extract-msg/0.46.2/
+.. |PyPI3| image:: https://img.shields.io/badge/pypi-0.47.0-blue.svg
+   :target: https://pypi.org/project/extract-msg/0.47.0/
 
 .. |PyPI2| image:: https://img.shields.io/badge/python-3.8+-brightgreen.svg
-   :target: https://www.python.org/downloads/release/python-3816/
+   :target: https://www.python.org/downloads/release/python-3810/
 
 .. |Read the Docs| image:: https://readthedocs.org/projects/msg-extractor/badge/?version=latest
     :target: https://msg-extractor.readthedocs.io/en/stable/?badge=latest
 
@@ -15,8 +15,8 @@ class Package(NamedTuple):
     """
     A class representing one of the subpackages of a module.
     """
-    modules : List[str]
-    packages : List[str]
+    modules: List[str]
+    packages: List[str]
 
 
 
@@ -65,7 +65,7 @@ def _readProject(root) -> Dict[str, Dict[str, bool]]:
     return ret
 
 
-def _makePackage(name : str, data : Dict[str, bool]) -> Package:
+def _makePackage(name: str, data: Dict[str, bool]) -> Package:
     return Package([f'{name}.{x}' for x in data if not data[x]], [f'{name}.{x}' for x in data if data[x]])
 
 
@@ -79,7 +79,7 @@ def run():
     writeAutoGenerated((x + '.rst' for x in project))
 
 
-def generateFile(name : str, package : Package):
+def generateFile(name: str, package: Package):
     with open(DIRECTORY / (name + '.rst'), 'w') as f:
         # Header.
         temp = name.replace('_', '\\_') + ' package'
@@ -125,10 +125,10 @@ def readProject(root) -> Dict[str, Package]:
     Returns a dictionary of package names to Package instances for a project.
     """
     initialRead = _readProject(root)
-    return {x : _makePackage(x, y) for x, y in initialRead.items()}
+    return {x: _makePackage(x, y) for x, y in initialRead.items()}
 
 
-def writeAutoGenerated(files : List[str]) -> None:
+def writeAutoGenerated(files: List[str]) -> None:
     """
     Writes the _autogen.txt file.
     """
 
@@ -27,8 +27,8 @@
 #    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 
 __author__ = 'Destiny Peterson & Matthew Walker'
-__date__ = '2023-11-11'
-__version__ = '0.46.2'
+__date__ = '2023-12-09'
+__version__ = '0.47.0'
 
 __all__ = [
     # Modules:
@@ -37,6 +37,7 @@
     'enums',
     'exceptions',
     'msg_classes',
+    'null_date',
     'properties',
     'structures',
 
@@ -61,7 +62,7 @@
 # Ensure these are imported before anything else.
 from . import constants, enums, exceptions
 
-from . import attachments, msg_classes, properties, structures
+from . import attachments, msg_classes, null_date, properties, structures
 from .attachments import Attachment, AttachmentBase, SignedAttachment
 from .msg_classes import Message, MSGFile
 from .ole_writer import OleWriter
 
@@ -9,7 +9,7 @@
 
 
 
-def createDocument(tokens : Iterable[Token]) -> bytes:
+def createDocument(tokens: Iterable[Token]) -> bytes:
     """
     Combines the tokenized data into bytes and returns the document.
     """
 
@@ -47,11 +47,13 @@
 )
 
 
-def _listInsertMult(dest : List[_T], source : Iterable[_T], index : int = -1):
+def _listInsertMult(dest: List[_T], source: Iterable[_T], index: int = -1):
     """
     Inserts into :param dest: all the items in :param source: at the index
-    specified. :param dest: can be any mutable sequence with :method insert:,
-    :method __len__:, and :method extend:.
+    specified.
+
+    :param dest: Any mutable sequence with methods ``insert``, ``__len__``, and
+        ``extend``.
 
     If :param index: is not specified, the default position is the end of the
     list. This is also where things will be inserted if index is greater than or
@@ -64,33 +66,35 @@ def _listInsertMult(dest : List[_T], source : Iterable[_T], index : int = -1):
             dest.insert(index + offset, item)
 
 
-def injectStartRTF(document : bytes, injectTokens : Union[bytes, List[Token]]) -> List[Token]:
+def injectStartRTF(document: bytes, injectTokens: Union[bytes, List[Token]]) -> List[Token]:
     """
     Injects the specified tokens into the document, returning a new copy of the
-    document as a list of Tokens. Injects the data just before the first
-    rendered character.
+    document as a list of Tokens.
+
+    Injects the data just before the first rendered character.
 
     :param document: The bytes representing the RTF document.
     :param injectTokens: The tokens to inject into the document. Can either be
-        a list of Tokens or bytes to be tokenized.
+        a list of ``Token``\\s or ``bytes`` to be tokenized.
 
     :raises TypeError: The data is not recognized as RTF.
     :raises ValueError: An issue with basic parsing occured.
     """
     return injectStartRTFTokenized(tokenizeRTF(document), injectTokens)
 
 
-def injectStartRTFTokenized(document : List[Token], injectTokens : Union[bytes, Iterable[Token]]) -> List[Token]:
+def injectStartRTFTokenized(document: List[Token], injectTokens: Union[bytes, Iterable[Token]]) -> List[Token]:
     """
     Like :function injectStartRTF:, injects the specified tokens into the
     document, returning a reference to the document, except that it accepts a
-    document in the form of a list of tokens. Injects the data just before the
-    first rendered character.
+    document in the form of a list of tokens.
+
+    Injects the data just before the first rendered character.
 
     :param document: The list of tokens representing the RTF document. Will only
         be modified if the function is successful.
     :param injectTokens: The tokens to inject into the document. Can either be
-        a list of Tokens or bytes to be tokenized.
+        a list of ``Token``\\s or ``bytes`` to be tokenized.
 
     :raises TypeError: The data is not recognized as RTF.
     :raises ValueError: An issue with basic parsing occured.
 
@@ -24,12 +24,12 @@ class TokenType(enum.Enum):
 
 class Token(NamedTuple):
     # The raw bytes for the token, used to recreate the document.
-    raw : bytes
+    raw: bytes
     # The type of the token.
-    type : TokenType
+    type: TokenType
     ## The following are optional as they only apply for certain types of tokens.
     # The name of the token, if it is a control or destination.
-    name : Optional[bytes] = None
+    name: Optional[bytes] = None
     # The parameter of the token, if it has one. If the token is a `\'hh` token,
     # this will be the decimal equivelent of the hex value.
-    parameter : Optional[int] = None
+    parameter: Optional[int] = None
@@ -51,11 +51,13 @@
 )
 
 
-def _finishTag(startText : bytes, reader : io.BytesIO) -> Tuple[bytes, Optional[bytes], Optional[int], bytes]:
+def _finishTag(startText: bytes, reader: io.BytesIO) -> Tuple[bytes, Optional[bytes], Optional[int], bytes]:
     """
     Finishes reading a tag, returning the needed parameters to make it a
-    token. The return is a 4 tuple of the raw token bytes, the name field,
-    the parameter field (as an int), and the next character after the tag.
+    token.
+
+    The return is a 4 tuple of the raw token bytes, the name field, the
+    parameter field (as an int), and the next character after the tag.
     """
     # Very simple rules here. Anything other than a letter and we change
     # state. If the next character is a hypen, check if the character after
@@ -99,7 +101,7 @@ def _finishTag(startText : bytes, reader : io.BytesIO) -> Tuple[bytes, Optional[
     return startText, name, param, nextChar
 
 
-def _readControl(startChar : bytes, reader : io.BytesIO) -> Tuple[Tuple[Token, ...], bytes]:
+def _readControl(startChar: bytes, reader: io.BytesIO) -> Tuple[Tuple[Token, ...], bytes]:
     """
     Attempts to read the next data as a control, returning as many tokens
     as necessary.
@@ -163,7 +165,7 @@ def _readControl(startChar : bytes, reader : io.BytesIO) -> Tuple[Tuple[Token, .
             return (Token(startChar, TokenType.SYMBOL),), reader.read(1)
 
 
-def _readText(startChar : bytes, reader : io.BytesIO) -> Tuple[Tuple[Token, ...], bytes]:
+def _readText(startChar: bytes, reader: io.BytesIO) -> Tuple[Tuple[Token, ...], bytes]:
     """
     Attempts to read the next data as text.
     """
@@ -182,17 +184,15 @@ def _readText(startChar : bytes, reader : io.BytesIO) -> Tuple[Tuple[Token, ...]
     return tuple(Token(x, TokenType.TEXT) for x in chars), nextChar
 
 
-def tokenizeRTF(data : bytes, validateStart : bool = True) -> List[Token]:
+def tokenizeRTF(data: bytes, validateStart: bool = True) -> List[Token]:
     """
     Reads in the bytes and sets the tokens list to the contents after
-    tokenizing. If tokenizing fails, the current tokens list will not be
-    changed.
+    tokenizing.
 
-    Direct references to the previous tokens list will only point to the
-    previous and not to the current one.
+    If tokenizing fails, the current tokens list will not be changed.
 
-    :param validateStart: If False, does not check the first few tags. Useful
-        when tokenizing a snippet rather than a document.
+    :param validateStart: If ``False``, does not check the first few tags.
+        Useful when tokenizing a snippet rather than a document.
 
     :raises TypeError: The data is not recognized as RTF.
     :raises ValueError: An issue with basic parsing occured.