-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: s3 transfer manager v2 #3079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: s3 transfer manager v2 #3079
Conversation
This is an initial phase for the s3 transfer manager v2, which includes: - Progress Tracker with a default Console Progres Bar. - Dedicated Multipart Download Listener for listen to events specificly to multipart download. - Generic Transfer Listener that will be used in either a multipart upload or a multipart download. The progress tracker is dependant on the Generic Transfer Listener, and when enabled it uses the same parameter to be provided as the progress tracker. This is important because if there is a need for listening to transfer specific events and also track the progress then, a custom implementation must be done that incorporate those two needs together, otherwise one of each other must be used. - Single Object Download - Multipart Objet Download This initial implementation misses the test cases.
- Refactor set a single argument, even when not exists, in the console progress bar. - Add a specific parameter for showing the progress rendering defaulted to STDOUT. - Add test cases for ConsoleProgressBar. - Add test cases for DefaultProgressTracker. - Add test cases for ObjectProgressTracker. - Add test cases for TransferListener.
- Add test cases for multipart download listener.
- Add a trait to the MultipartDownloader implementation to keep the main implementatio cleaner. - Add test cases for multipart downloader, in specific testing part and range get multipart downloader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good on the first pass- there are some nits like function braces needing newlines, new files needing newlines, naming conventions, etc. also had some questions about design
Refactor: - Moves opening braces into a new line. - Make requestArgs an optional argument. - Remove unnecessary traits. - Use traditional declarations. Adds: - Download directory feature.
Refactor: - Add a message placeholder for progress status. For example in case of errors. Adds: - Upload feature, missing multipart functionality.
- Add upload directory feature
- Add a dedicated multipart upload implementation - Add transfer progress to multipart upload - Add upload directory with the required options. - Create specific response models for upload, and upload directory. - Add multipart upload test cases. - Fix transfer listener completation eval.
Short namespace from `Aws\S3\Features\S3Transfer` to `Aws\S3\S3Transfer`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments- I think a few from the last round were left addressed also. I'd do another check for function opening braces (needing to be moved to a new line) and new files that are missing a newline at the end. More test classes needed as well but I'm assuming those are on the way
- Implement progress tracker based on SEP spec. - Add a default progress bar implementation. - Add different progress tracker formats: -- Plain progress format: [|progress_bar|] |percent|% -- Transfer progress format: [|progress_bar|] |percent|% |transferred|/|tobe_transferred| |unit| -- Colored progress format: |object_name|:\n\033|color_code|[|progress_bar|] |percent|% |transferred|/|tobe_transferred| |unit| |message|\033[0m - Add a default single progress tracker implementation. - Add a default multi progress tracker implementation for tracking directory transfers. - Include tests unit just for console progress bar.
- Fixes current test cases for: - MultipartUploader - MultipartDownloader - ProgressTracker
- Remove progress bar color enum since the colors were moved into the specific format that requires them.
TransferListener must be tested from the implementations that extends and use this abstract class.
Add nullable type to listenerNotifier property in the MultipartUploader implementation.
- Tests for MultiProgressTracker - Tests for SingleProgressTracker - Tests for ProgressBarFormat - Tests for TransferProgressSnapshot - Tests for TransferListenerNotifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking better- still needing some more unit tests, along with integ tests. Left some comments and nits on formatting. It seems each new file is missing a newline so I'd check those as well
- Refactor code to address some styling related feedback. - Add upload and uploadDirectory unit tests.
- Fix MultipartUpload tests by increasing the part size from 1024 to 10240000 so it gets between the allowed part size range 5MB-5GBs. - Rename tobe to to_be in the progress formatting.
- Add download tests - Add download directory tests - Minor naming refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Just a few nits this time around. Still needing integ tests- will do another round of reviews once those are up.
src/S3/S3Transfer/Progress/ColoredTransferProgressBarFormat.php
Outdated
Show resolved
Hide resolved
src/S3/S3Transfer/Progress/ColoredTransferProgressBarFormat.php
Outdated
Show resolved
Hide resolved
- Add upload integ tests for: - Single uploads - Multipart uploads - Checksum in single uploads - Checksum in multipart uploads - Add download integ tests for: - Single downloads - Multipart downloads
- Add integ tests for directory uploads - Add integ tests for directory downloads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits, but the most important requests are: adding upload()
and download()
methods on the uploader and downloader classes that call promise()
(similar to the old implementation)and testing the resolvesOutsideTargetDirectory
logic.
I would do an audit of hard-coded values that can be moved to classes, line length (max 85 char) and all new files that do not end with a newline.
- Move some fixed values out of the methods into consts. - Address a line exceeded 80 chars. - Declare keys used across different implementations as consts.
- Fix keys declaration in TransferListener.php - Make use of DIRECTORY_SEPARATOR const instead of hardcoding `/`
- Some implementations using TransferListener were missing the import statement.
Work In Progress...
- This change adds download handlers when dealing with downloads - Add a new API for downlaoding files "downloadFile" - Create directories by default when downloading files - Overrides destination files if the flag to fail is not enabled - Wrap API parameters in a separated dataclass. For example, UploadRequest - Make responses in download and upload to extends Result, which allows array access in the response. - Others code refactoring, such as test fixing based on new updates, etc.
- Remove model files. - Update function to use lambda syntax.
- Removed old MultipartUploader and DownloaderImplementation which were suffixed with [class]Initial.php - Remove empty space.
When a s3 prefix is provided and the object key contains the delimiter then, the prefix should be stripped off from the object key.
- when the directory separator is different from the s3 delimiter then the separator from the object key is replaced with the os directory separator. - Update the tests to validate object key destination correctly whe using the download directory API.
The delimiter for the list object request in a download directory must be defaulted to null unless explicitly provided in the list object v2 args.
The data provider for testResolvesOutsideTargetDirectory was renamed to `resolvesOutsideTargetDirectoryProvider`.
private StreamInterface | string $source; | ||
|
||
/** @var array */ | ||
private array $putObjectRequestArgs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my other comment in MultipartUploader
: should we have this live on S3TransferManager
? that's the only place we're using the putObject
operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if so, do we want something like $createMultipartUploadArgs
, $uploadPartArgs
and $completeMultipartUploadArgs
for the multipart workflows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as the one I made for the same param above- putObjectRequestArgs
implies it will only be used for putObject
requests, which are only performed for single uploads. From what I can see, this parameter is also used for Multipart uploads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it also be possible to rename to UploadDirectoryResult
extend Aws\Result
here with some overrides (i.e.) keep some of this logic in place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the use case for UploadDirectoryResult to extend Aws\Result? Right now this class just has two counting properties, one for successful uploads and one for failed uploads. Like:
class UploadDirectoryResult {
int $failedUploads;
int $successfulUploads;
}
In AbstractMultipartUploader the parameter used to be putObjectRequestArgs, but it makes more sense to name it requestArgs.
- Make config arguments optional - Make getObjectRequest/putObjectRequest argument optional - Delete file if exists when using the file download handler.
- Make getObjectRequestArgs optional in download directory operation. - Add config key in docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is looking great. Thanks for dealing with all of the churn. Mostly formatting comments and a few exception handling suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make sure to check all new files for missing newlines at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still seeing a few of these
|
||
namespace Aws\S3\S3Transfer\Models; | ||
|
||
class DownloadDirectoryResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change this to DownloadDirectoryResult
and extend Aws\Result
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what would be the use case right now for doing this. This class just holds how many uploads failed and how many succeeded. So it is not hold any other data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think the name change should still apply. Is there a way to expose (a group of) the individual DownloadResult
objects here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should expose the download results here. Or at least for failed request. I will think what could be the best approach here, because, I don't want to have a class holding so many results which could cause unnecessary memory consumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Probably not ideal for successful downloads. But we should probably include all of the results for successful uploads on its corresponding class
*/ | ||
class PartGetMultipartDownloader extends MultipartDownloader | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra newline here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still seeing this
$percentsSum = 0; | ||
/** | ||
* @var $_ | ||
* @var SingleProgressTracker $progressTracker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this docblock needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not necessary. I removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still seeing this. I'm hoping this (and other comments I "un-resolved") are not just GitHub acting buggy. Let me know if you're seeing something different
public function theMultipartUploadShouldHaveBeenAbortedForFile($file): void | ||
{ | ||
$client = self::getSdk()->createS3(); | ||
$inProgressMultipartUploads = $client->listMultipartUploads([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any calls we're making should be wrapped in a try/catch and using AssertFail()
if there's a failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AssertFail(): Seems to not be a valid method. However, I worked it out by capturing the exception and asserting the message gotten is the expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert::fail()
is what I actually meant. There are a few other examples of it in our integ tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the prior comments I "un-resolved" may be related to the new diff view I've been using. let me know if you're seeing any discrepancies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still seeing a few of these
* | ||
* @return PromiseInterface | ||
*/ | ||
public function download( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the wrong entry point for the comment, but I saw we had recursive directory upload functionality. We should have that for downloads as well per the spec, but I don't think that's a hard requirement
*/ | ||
class PartGetMultipartDownloader extends MultipartDownloader | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still seeing this
$percentsSum = 0; | ||
/** | ||
* @var $_ | ||
* @var SingleProgressTracker $progressTracker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still seeing this. I'm hoping this (and other comments I "un-resolved") are not just GitHub acting buggy. Let me know if you're seeing something different
- Make data model final - Refactor test cases to use correct parameters for S3TransferManager APIs
- Fix S3TransferManagetContext.php to use the correct parameter expected in the different APIs exposed by S3TranserManager - Refactor some formatting styling.
- Added empty line at the end of files. - Add a more descriptive documentation in $failsWhenDestinationExists. - Remove unnecessary break line.
/** | ||
* @return PromiseInterface | ||
*/ | ||
protected function createMultipartUpload(): PromiseInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw there were method name changes in yenfryherrerafeliz#5. Are they going to be added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, yes, they will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should get them changed before this ships so we don't break backward compatibility
]; | ||
|
||
/** @var StreamInterface|string */ | ||
private StreamInterface | string $source; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one still needs to be addressed
* @return StreamInterface | ||
*/ | ||
private function parseBody( | ||
string | StreamInterface $source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one still needs to be addressed
- Remove spaces between union type definitions - Add documentation for config parameters in UploadDirectoryRequest. - Add missing new lines in a few places.
public function __construct( | ||
S3ClientInterface $s3Client, | ||
array $requestArgs, | ||
array $config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be made optional since the specified config params are all optional?
/** | ||
* @return PromiseInterface | ||
*/ | ||
protected function createMultipartUpload(): PromiseInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should get them changed before this ships so we don't break backward compatibility
{ | ||
$createMultipartUploadArgs = $this->requestArgs; | ||
if ($this->requestChecksum !== null) { | ||
$createMultipartUploadArgs['ChecksumType'] = 'FULL_OBJECT'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's been some chatter about FULL_OBJECT
checksums that we need to account for- for MPUs, S3 only supports CRC-based algorithms for calculating full object checksums server-side. Something to look into
public function __construct( | ||
string $sourceDirectory, | ||
string $targetBucket, | ||
array $putObjectRequestArgs = [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd be better off going with something more generic like $requestArgs
or $uploadRequestArgs
since putObject
isn't the only upload operation that will be called
* @return self | ||
*/ | ||
public static function fromLegacyArgs( | ||
string $sourceDirectory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there's some weird spacing here- probably an auto-formatter
public function __construct( | ||
S3ClientInterface $s3Client, | ||
array $requestArgs, | ||
array $config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as the other $config
params- does this need to be required?
$destinationFile, | ||
$config['fails_when_destination_exists'] ?? false, | ||
new DownloadRequest( | ||
null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be more readable if you use named params or something like fromArray
used for transforming the upload args for uploadDirectory
'Bucket' => 'FooBucket', | ||
...$commandArgs | ||
]; | ||
$cleanUpFns = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to put these cleanup operations in functions? $tempDir
and $source
are both accessible in the finally
block scope— couldn't this logic be encapsulated in an if/else in the finally
block? Functionally, it's fine, but a little easier to read/reason with if it's all in the finally block
bool $expectError | ||
): void | ||
{ | ||
$cleanUpFns = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
'Bucket' => 'FooBucket', | ||
...$checksumConfig, | ||
]; | ||
$cleanUpFns = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as the first one
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.