-
Notifications
You must be signed in to change notification settings - Fork 103
Flexibility and cleanup #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lzma (known from .xz files) can compress better than zlib, but takes significantly more time for that. Note: compressor is set up in attic.key.KeyBase and currently still uses ZlibCompressor. This changeset is primarily for experimenting with lzma and also to keep changesets clean. Selection, auto-detection and parametrization of compression method is still TODO.
…|lzma) I had to do a bit of bit-fiddling to preserve backwards compatibility. Previous code used just 1 byte to determine encryption type and compression was hardcoded to be zlib. It uses type bytes 0x00, 0x01 and 0x02 for that. The record layout was rather fixed and there was no variable length part to add a compression type byte. So I split that type byte: the upper 4bits are compression (0 means zlib as before), the lower 4 bits are for encryption.
now compression stuff is at top, encryption/key stuff at bottom
e.g.: attic init --encryption=none --compression 0 --mac 1 repo.attic Note: Numeric --compression and --mac values are a bit simplistic, but even if one used lots of string choices there, one would still have to look them up in the docs.
|
Some measurements with PR #207 code - all tests on local SSD filesystems: --compression=6 --mac=0 (zlib default level 6 + sha256) --compression=6 --mac=1 ("" + sha512-256) --compression=1 --mac=1 (fastest zlib compression + "") --compression=0 --mac=1 (no compression + "") |
a MAC can be seen as digital signature (but that was not meant in the comment, but the parameters of __init__ method, it's "signature")
… generator to create format from meta and data
it was nice for fixed-sized overheads, but the next changeset makes them variable size, so a correct guess in the unit tests works better.
this is much easier to maintain (and change, if needed) than all those hardcoded offsets. note: when using packb to store a namedtuple's data, just the tuple's elements get stored, in order (not the names). thus, the overhead is rather small. but we can just recreate the namedtuple from the tuple returned by unpackb. namedtuples are very efficient and prettier to deal with than tuples. alternatively, a dictionary could be used, but packb would create more overhead for it as key names and values would be stored.
|
opinions? suggestions? code review? |
note: maybe we add the full (not truncated) sha512 hash and hmac-sha512, keep the numbers free.
|
@ThomasWaldmann is your merge-all branch working now? seems too cool to try it. |
|
@seqizz don't use the merge-all stuff, it is superseded by https://github.com/borgbackup/borg . The flexible compression stuff is in there (but implemented in a more compatible and also more practical way, plus lz4 compression). The crypto changes are not in master branch there yet. |
|
closing this pull request, seems unwanted. |
Flexibility
These changes bring the needed flexibility to support multiple compression levels and algorithms (e.g. zlib all levels (not just 6), and also lzma all levels). It is very easy to add other compression algorithms / levels now.
For better performance, I also added sha512-256 hashing algorithm, which is faster than sha256 on 64bit CPUs.
I implemented the changes in a backwards compatible way, so old repos still work.
usage: attic init --compression=6 --mac=0 # defaults (as before)
The header format itself is now much more flexible due to usage of msgpack there.
E.g. the hmac length is not required to be 32 bytes any more (see aes-gcm, which can only yield up to 16 bytes, see sha512, which could give 64 bytes). Also, the stored IV could be full 16 bytes now. (all not done yet)
Cleanup
A lot of hardcoded offsets and ranges were replaced by better to read and less fragile meta namedtuple elements.
See also issue #210.