Skip to content

Conversation

@ThomasWaldmann
Copy link
Contributor

Flexibility

These changes bring the needed flexibility to support multiple compression levels and algorithms (e.g. zlib all levels (not just 6), and also lzma all levels). It is very easy to add other compression algorithms / levels now.

For better performance, I also added sha512-256 hashing algorithm, which is faster than sha256 on 64bit CPUs.

I implemented the changes in a backwards compatible way, so old repos still work.

usage: attic init --compression=6 --mac=0 # defaults (as before)

The header format itself is now much more flexible due to usage of msgpack there.
E.g. the hmac length is not required to be 32 bytes any more (see aes-gcm, which can only yield up to 16 bytes, see sha512, which could give 64 bytes). Also, the stored IV could be full 16 bytes now. (all not done yet)

Cleanup

A lot of hardcoded offsets and ranges were replaced by better to read and less fragile meta namedtuple elements.

See also issue #210.

lzma (known from .xz files) can compress better than zlib, but takes significantly more time for that.

Note:
compressor is set up in attic.key.KeyBase and currently still uses ZlibCompressor.

 This changeset is primarily for experimenting with lzma and also to keep changesets clean.
Selection, auto-detection and parametrization of compression method is still TODO.
…|lzma)

I had to do a bit of bit-fiddling to preserve backwards compatibility.
Previous code used just 1 byte to determine encryption type and compression was hardcoded to be zlib.
It uses type bytes 0x00, 0x01 and 0x02 for that.
The record layout was rather fixed and there was no variable length part to add a compression type byte.
So I split that type byte: the upper 4bits are compression (0 means zlib as before), the lower 4 bits are for encryption.
now compression stuff is at top, encryption/key stuff at bottom
e.g.:
 attic init --encryption=none --compression 0 --mac 1 repo.attic

Note: Numeric --compression and --mac values are a bit simplistic, but even if one
used lots of string choices there, one would still have to look them up in the docs.
@ThomasWaldmann
Copy link
Contributor Author

Some measurements with PR #207 code - all tests on local SSD filesystems:

--compression=6 --mac=0 (zlib default level 6 + sha256)
Duration: 6 minutes 53.29 seconds
Number of files: 247725 6.03 GB 2.36 GB 2.15 GB

--compression=6 --mac=1 ("" + sha512-256)
Duration: 6 minutes 42.46 seconds
Number of files: 247725 6.03 GB 2.36 GB 2.15 GB

--compression=1 --mac=1 (fastest zlib compression + "")
Duration: 4 minutes 29.36 seconds
Number of files: 247725 6.03 GB 2.53 GB 2.31 GB

--compression=0 --mac=1 (no compression + "")
Duration: 4 minutes 15.61 seconds
Number of files: 247725 6.03 GB 6.04 GB 5.49 GB

a MAC can be seen as digital signature (but that was not meant in the comment, but the parameters of __init__ method, it's "signature")
… generator to create format from meta and data
it was nice for fixed-sized overheads, but the next changeset makes them variable size, so a correct guess in the unit tests works better.
this is much easier to maintain (and change, if needed) than all those hardcoded offsets.

note:

when using packb to store a namedtuple's data, just the tuple's elements get stored, in order (not the names).
thus, the overhead is rather small. but we can just recreate the namedtuple from the tuple returned by unpackb.

namedtuples are very efficient and prettier to deal with than tuples. alternatively, a dictionary could be used,
but packb would create more overhead for it as key names and values would be stored.
@ThomasWaldmann
Copy link
Contributor Author

opinions? suggestions? code review?

@ThomasWaldmann ThomasWaldmann changed the title Flexible compression Flexibility and cleanup Mar 7, 2015
@seqizz
Copy link

seqizz commented Oct 13, 2015

@ThomasWaldmann is your merge-all branch working now? seems too cool to try it.

@ThomasWaldmann
Copy link
Contributor Author

@seqizz don't use the merge-all stuff, it is superseded by https://github.com/borgbackup/borg .

The flexible compression stuff is in there (but implemented in a more compatible and also more practical way, plus lz4 compression). The crypto changes are not in master branch there yet.

@ThomasWaldmann
Copy link
Contributor Author

closing this pull request, seems unwanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants