-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Background and motivation
Make the position or offset of the data in the enclosing stream for a System.Formats.Tar.TarEntry object a public property.
The TarEntry objects returned by TarReader.GetNextEntryAsync() contain a DataStream member that encapsulates the section of the enclosing super stream containing the entry data. We need the offset into the enclosing super stream so we can process the entry data without having to use DataStream member.
This would enable more flexible use of tar balls in data stores that support features like concurrent access. Azure Blob Storage is an example. An example of a current limitation that would be overcome by this would be the concurrent uploading of entry data from tar balls stored in Azure Blob Storage.
The current implementation has this info in a private member of the internal stream type. We need this to be made public.
| protected readonly long _startInSuperStream; |
The motivating scenario is that we need to handle potentially very large tar balls in Azure Blob Storage. The TarReader has each entry sharing the same super stream. We can't do concurrent downloads for large tar balls in storage with that structure. Multiple threads can't concurrently share the super stream. We would like to only use the TarReader to get the sizes, names, and offsets into the super stream for each entry and use our own stream handling to download the entry data.
API Proposal
public abstract partial class TarEntry
{
+ public long StartPositionOfDataStream { get; }
}This API can return -1 when it is not possible to return a valid value, or if we want this to be more explicit, it would throw exceptions when:
- The data stream is null.
- The archive stream is unseekable.
- The entry represents a special/metadata entry, instead of a regular filesystem type: PAX extended attributes entry, PAX global extended attributes entry, GNU long link, GNU long path.
- Either the archive stream or the data stream is disposed.
API Usage
// Create stream for tar ball data in Azure Blob Storage
var blobClient = Azure.Storage.Blobs.BlobClient(....);
var blobClientStream = await blobClient.OpenReadAsync(...);
// Create TarReader for the stream and get a TarEntry
var tarReader = new System.Formats.Tar.TarReader(blobClientStream);
var tarEntry = await tarReader.GetNextEntryAsync();
// get position of TarEntry data in blob stream
var entryOffsetInBlobStream = tarEntry.StartPositionOfDataStream;
var entryLength = tarEntry.Length;
// create a separate stream
var newBlobClientStream = await TarBlob.OpenReadAsync(...);
newBlobClientStream.Seek(entryOffsetInBlobStream, SeekOrigin.Begin);
// read tar ball content from separate BlobClient stream
var bytes = new byte[length];
await tarBlobStream.ReadAsync(bytes, 0, (int)entryLength);
Alternative Designs
No response
Risks
No response