-
Couldn't load subscription status.
- Fork 5.2k
Description
This was started in https://github.com/dotnet/corefx/issues/9657
There seems to be a growing desire/need for a forward-only API that accesses compressed file formats (e.g. zip, gzip, tar, etc.) in a streaming manner. This means very large files as well as streams like network streams can be read and decompressed on the fly. Basically, the API reads from Stream objects and never seeks on it. This is how the Reader/Writer API from SharpCompress works.
Here's a sample from the unit tests:
using (Stream stream = new ForwardOnlyStream(File.OpenRead(path)))
using (IReader reader = ReaderFactory.Open(stream))
{
while (reader.MoveToNextEntry())
{
if (!reader.Entry.IsDirectory)
{
reader.WriteEntryToDirectory(test.SCRATCH_FILES_PATH, new ExtractionOptions()
{
ExtractFullPath = true,
Overwrite = true
});
}
}
}
public interface IReader : IDisposable
{
event EventHandler<ReaderExtractionEventArgs<IEntry>> EntryExtractionProgress;
event EventHandler<CompressedBytesReadEventArgs> CompressedBytesRead;
event EventHandler<FilePartExtractionBeginEventArgs> FilePartExtractionBegin;
ArchiveType ArchiveType { get; }
IEntry Entry { get; }
/// <summary>
/// Decompresses the current entry to the stream. This cannot be called twice for the current entry.
/// </summary>
/// <param name="writableStream"></param>
void WriteEntryTo(Stream writableStream);
bool Cancelled { get; }
void Cancel();
/// <summary>
/// Moves to the next entry by reading more data from the underlying stream. This skips if data has not been read.
/// </summary>
/// <returns></returns>
bool MoveToNextEntry();
/// <summary>
/// Opens the current entry as a stream that will decompress as it is read.
/// Read the entire stream or use SkipEntry on EntryStream.
/// </summary>
EntryStream OpenEntryStream();
}WriteEntryToDirectory is an extension method that provides some shortcuts for dealing with file operations but what it really does is just grab the internal stream and decompresses. The actual entry method is just IReader.WriteEntryTo(Stream); If the entry isn't decompressed then the internal stream is just moved forward and not decompressed if possible (some formats require decompression since compressed length can be unknown)
The Writer API works similarly.
There is also a generic API from ReaderFactory or WriterFactory that doesn't require knowledge of the format beforehand. There is also a similar ArchiveFactory that is writeable (that uses WriterFactory internally to save) that could also be used for the current ZipArchive API and beyond.
As the author of SharpCompress, I'd like to push a lot of the ideas into core library but having native access to the compression streams (like the internal zlib) would be a great performance benefit for me. I haven't ever written any compression algorithm implementations myself so I'm sure my managed implementations need a lot of love.
I would start by creating ZipReader and ZipWriter (as well as starting the generic API) using a lot of the internal code already in the core library to prove out the API. This kind of relates to https://github.com/dotnet/corefx/issues/14853 but forward-only access is something most libraries don't account for so I'm not sure. Other easy additions would be Reader/Writer support for GZip, BZip2 and LZip (with an LZMA compressor).
Tar support linked with the previous single file formats would be a larger addition. SharpCompress deals with tar.gz, tar.bz2 and tar.lz auto-magically by detecting first the compressed format then a tar file inside. The above API works the same.
Thoughts?
Summoning a few people from the other issue
cc: @ianhays @karelz @qmfrederik