gvfs-helper: auto-retry after network errors, resource throttling, split GET and POST semantics #208

jeffhostetler · 2019-10-15T21:43:51Z

I'm marking this WIP because I haven't done a cleanup round nor squashed things yet.
But I want to get the CI builds a chance to run tonight.

This series attempts to:
[x] auto-retry after network outages
[x] throttle back when requested (or demanded) by the server.

Questions:
[done] What is the right default for the network retry limit?
[done] Should the throttle back have a time limit? (It's one thing to wait 3 or 4 minutes between
packfiles because we hit it too hard, but another if it says we should wait an hour or two.)
[no] Should the network retry look at the amount of data received and try to resume it?
[no] Should the network retry split large packfile requests if we can tell the user's network is flakey?

Basic testing with 5 concurrent fetches shows that it is pretty easy to get throttled and
makes me wonder if we should even bother with multi-threading this. Perhaps we just
limit it to waiting for index-pack in another thread, but only plan to have 1 network thread.
Or maybe that is just when talking to the main server -- we might be able to multithread
when talking to the cache-server.

wilbaker · 2019-10-16T15:19:05Z

What is the right default for the network retry limit?

We might decide we no longer want this approach, but the approach that's been used by VFS4G is:

By default 6 maximum retries for a total of 7 attempts: https://github.com/microsoft/VFSForGit/blob/a537d002be8d48675f48e3e97fe1c8c6359133a9/GVFS/GVFS.Common/RetryConfig.cs#L10
The maximum number of retries can be overridden by a setting in git config:
https://github.com/microsoft/VFSForGit/blob/a537d002be8d48675f48e3e97fe1c8c6359133a9/GVFS/GVFS.Common/RetryConfig.cs#L102
A default timeout of 30 seconds (can also be overridden with the config):
https://github.com/microsoft/VFSForGit/blob/a537d002be8d48675f48e3e97fe1c8c6359133a9/GVFS/GVFS.Common/RetryConfig.cs#L11
There is no delay/backoff between the first and second attempt:
https://github.com/microsoft/VFSForGit/blob/a537d002be8d48675f48e3e97fe1c8c6359133a9/GVFS/GVFS.Common/RetryWrapper.cs#L103
There is a maximum backoff of 5 minutes:
https://github.com/microsoft/VFSForGit/blob/a537d002be8d48675f48e3e97fe1c8c6359133a9/GVFS/GVFS.Common/RetryWrapper.cs#L12
The backoff is exponential and randomly adjusted by 10% (the idea was that if there is a brief outage all clients won't retry at the exact same time):
https://github.com/microsoft/VFSForGit/blob/785ccb4a67281d0eb10f19cb53e6431b62b88555/GVFS/GVFS.Common/RetryBackoff.cs#L34

jeffhostetler · 2019-10-16T20:34:56Z

@wilbaker Thanks for the pointers.

See microsoft/git#208 for the Git changes. This should make the gvfs-helper more robust to intermittent failures when a single network call fails.

derrickstolee · 2019-10-18T11:27:58Z

/azp run Microsoft.git

azure-pipelines · 2019-10-18T11:28:10Z

Azure Pipelines successfully started running 1 pipeline(s).

derrickstolee · 2019-10-18T11:43:46Z

@jeffhostetler after the 503 update, I was unable to repro any network failures in the C# functional tests. (I got a different flaky failure in the C# code, but that's different ;) )

I'm happy to re-test and approve after you do the cleanup to make this not WIP.

jeffhostetler · 2019-10-18T12:58:57Z

Thanks for the confirmation! Almost finished....

jeffhostetler · 2019-10-19T09:53:03Z

/azp run Microsoft.git

azure-pipelines · 2019-10-19T09:53:12Z

Azure Pipelines successfully started running 1 pipeline(s).

Add robust-retry mechanism to automatically retry a request after network errors. This includes retry after: [] transient network problems reported by CURL. [] http 429 throttling (with associated Retry-After) [] http 503 server unavailable (with associated Retry-After) Add voluntary throttling using Azure X-RateLimit-* hints to avoid being soft-throttled (tarpitted) or hard-throttled (429) on later requests. Add global (outside of a single request) azure-throttle data to track the rate limit hints from the cache-server and main Git server independently. Add exponential retry backoff. This is used for transient network problems when we don't have a Retry-After hint. Move the call to index-pack earlier in the response/error handling sequence so that if we receive a 200 but yet the packfile is truncated/corrupted, we can use the regular retry logic to get it again. Refactor the way we create tempfiles for packfiles to use <odb>/pack/tempPacks/ rather than working directly in the <odb>/pack/ directory. Move the code to create a new tempfile to the start of a single request attempt (initial and retry attempts), rather than at the overall start of a request. This gives us a fresh tempfile for each network request attempt. This simplifies the retry mechanism and isolates us from the file ownership issues hidden within the tempfile class. And avoids the need to truncate previous incomplete results. This was necessary because index-pack was pulled into the retry loop. Minor: Add support for logging X-VSS-E2EID to telemetry on network errors. Minor: rename variable: params.b_no_cache_server --> params.b_permit_cache_server_if_defined. This variable is used to indicate whether we should try to use the cache-server when it is defined. Got rid of double-negative logic. Minor: rename variable: params.label --> params.tr2_label Clarify that this variable is only used with trace2 logging. Minor: Move the code to automatically map cache-server 400 responses to normal 401 response earlier in the response/error handling sequence to simplify later retry logic. Minor: Decorate trace2 messages with "(cs)" or "(main)" to identify the server in log messages. Add params->server_type to simplify this. Signed-off-by: Jeff Hostetler <[email protected]>

derrickstolee · 2019-10-22T17:36:19Z

gvfs-helper-client.c

So, we are assuming that any queued objects will get a flush request eventually? That sounds reasonable.

Yeah, I've split the queued and immediate usage now. The dry-run/pre-scan loops already handle the queue and drain (for missing blobs usually). The main difference now is that any missing trees (or commits) during those loops will be immediately fetched in isolation, but the queue will remain.

Expose the differences in the semantics of GET and POST for the "gvfs/objects" API: HTTP GET: fetches a single loose object over the network. When a commit object is requested, it just returns the single object. HTTP POST: fetches a batch of objects over the network. When the oid-set contains a commit object, all referenced trees are also included in the response. gvfs-helper is updated to take "get" and "post" command line options. the gvfs-helper "server" mode is updated to take "objects.get" and "objects.post" verbs. For convenience, the "get" option and the "objects.get" verb do allow more than one object to be requested. gvfs-helper will automatically issue a series of (single object) HTTP GET requests and creating a series of loose objects. The "post" option and the "objects.post" verb will perform bulk object fetching using the batch-size chunking. Individual HTTP POST requests containing more than one object will be created as a packfile. A HTTP POST for a single object will create a loose object. This commit also contains some refactoring to eliminate the assumption that POST is always associated with packfiles. In gvfs-helper-client.c, gh_client__get_immediate() now uses the "objects.get" verb and ignores any currently queued objects. In gvfs-helper-client.c, the OIDSET built by gh_client__queue_oid() is only processed when gh_client__drain_queue() is called. The queue is processed using the "object.post" verb. Signed-off-by: Jeff Hostetler <[email protected]>

jeffhostetler · 2019-10-23T18:26:48Z

@derrickstolee I think I'm done tinkering with this one. 2.20191023.1 has already been thru the functional tests and is good. Just did a final squash on my fixups and am running 2.20191023.2 thru the its paces.

derrickstolee

I'll rebase these commits onto tentative/features/sparse-checkout-2.24.0 for #214 after you merge.

jeffhostetler · 2019-10-23T19:34:26Z

Thanks for all your help!

…out-2.23.0 Upgrade to 2.20191023.7-sc which corresponds to commit microsoft/git@a782a7e and includes the following changes (since 2.20191015.2-sc): - microsoft/git#208 - microsoft/git#210

Resolves #195. Includes the following updates to `microsoft/git`: * microsoft/git#208: gvfs-helper: auto-retry after network errors, resource throttling, split GET and POST semantics. * microsoft/git#215: gvfs-helper: dramatically reduce progress noise.

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

jeffhostetler requested a review from derrickstolee October 17, 2019 17:31

derrickstolee mentioned this pull request Oct 17, 2019

[PR Build] Update Git to include retry logic in gvfs-helper microsoft/scalar#180

Closed

jeffhostetler force-pushed the users/jeffhost/gvfs-helper-robust-retry-take2 branch 2 times, most recently from 9a844ff to 0519894 Compare October 22, 2019 14:56

derrickstolee reviewed Oct 22, 2019

View reviewed changes

jeffhostetler force-pushed the users/jeffhost/gvfs-helper-robust-retry-take2 branch from a5d67f2 to 5c65e9a Compare October 23, 2019 17:21

derrickstolee approved these changes Oct 23, 2019

View reviewed changes

jeffhostetler changed the title ~~WIP auto-retry after network errors and resource throttling~~ gvfs-helper: auto-retry after network errors, resource throttling, split GET and POST semantics Oct 23, 2019

jeffhostetler merged commit a782a7e into microsoft:features/sparse-checkout-2.23.0 Oct 23, 2019

wilbaker mentioned this pull request Oct 23, 2019

git: upgrade to latest features/sparse-checkout-2.23.0 microsoft/scalar#194

Merged

derrickstolee mentioned this pull request Oct 25, 2019

Update Git to show less progress in gvfs-helper microsoft/scalar#196

Merged

derrickstolee pushed a commit that referenced this pull request Jun 1, 2020

Merge first wave of gvfs-helper feature

b098b57

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Jan 1, 2025

Merge first wave of gvfs-helper feature

cb3d4e3

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Jan 1, 2025

Merge first wave of gvfs-helper feature

7ebcf9e

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Jan 1, 2025

Merge first wave of gvfs-helper feature

980f6c4

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Feb 10, 2025

Merge first wave of gvfs-helper feature

bcaafd5

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Feb 27, 2025

Merge first wave of gvfs-helper feature

9320262

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Mar 5, 2025

Merge first wave of gvfs-helper feature

0732c93

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Mar 5, 2025

Merge first wave of gvfs-helper feature

5f50220

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

mjcheetham pushed a commit that referenced this pull request Mar 12, 2025

Merge first wave of gvfs-helper feature

daaf41f

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

mjcheetham pushed a commit that referenced this pull request Mar 17, 2025

Merge first wave of gvfs-helper feature

2116f15

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Jun 6, 2025

Merge first wave of gvfs-helper feature

39871bb

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Jun 6, 2025

Merge first wave of gvfs-helper feature

00a896d

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Jun 11, 2025

Merge first wave of gvfs-helper feature

90f95f0

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Jun 13, 2025

Merge first wave of gvfs-helper feature

8917b24

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Jun 16, 2025

Merge first wave of gvfs-helper feature

f725920

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Jun 16, 2025

Merge first wave of gvfs-helper feature

0387e3d

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Jun 16, 2025

Merge first wave of gvfs-helper feature

a1cd891

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Jul 8, 2025

Merge first wave of gvfs-helper feature

3bda4ed

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Aug 5, 2025

Merge first wave of gvfs-helper feature

b4ab89b

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Aug 5, 2025

Merge first wave of gvfs-helper feature

bb3b880

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Aug 8, 2025

Merge first wave of gvfs-helper feature

fc35049

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Aug 8, 2025

Merge first wave of gvfs-helper feature

b525093

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Aug 13, 2025

Merge first wave of gvfs-helper feature

718bd3c

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Aug 19, 2025

Merge first wave of gvfs-helper feature

659cd12

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Oct 17, 2025

Merge first wave of gvfs-helper feature

6473463

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Oct 17, 2025

Merge first wave of gvfs-helper feature

2608328

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Oct 17, 2025

Merge first wave of gvfs-helper feature

d39d219

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Oct 28, 2025

Merge first wave of gvfs-helper feature

8c6d05b

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Nov 5, 2025

Merge first wave of gvfs-helper feature

850b2e6

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Nov 6, 2025

Merge first wave of gvfs-helper feature

31e4c00

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

dscho pushed a commit that referenced this pull request Nov 6, 2025

Merge first wave of gvfs-helper feature

9dbc3e2

Includes commits from these pull requests: #191 #205 #206 #207 #208 #215 #220 #221 Signed-off-by: Derrick Stolee <[email protected]>

gvfs-helper: auto-retry after network errors, resource throttling, split GET and POST semantics #208

gvfs-helper: auto-retry after network errors, resource throttling, split GET and POST semantics #208

Uh oh!

Conversation

jeffhostetler commented Oct 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wilbaker commented Oct 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffhostetler commented Oct 16, 2019

Uh oh!

derrickstolee commented Oct 18, 2019

Uh oh!

azure-pipelines bot commented Oct 18, 2019

Uh oh!

derrickstolee commented Oct 18, 2019

Uh oh!

jeffhostetler commented Oct 18, 2019

Uh oh!

jeffhostetler commented Oct 19, 2019

Uh oh!

azure-pipelines bot commented Oct 19, 2019

Uh oh!

derrickstolee Oct 22, 2019

Choose a reason for hiding this comment

Uh oh!

jeffhostetler Oct 22, 2019

Choose a reason for hiding this comment

Uh oh!

jeffhostetler commented Oct 23, 2019

Uh oh!

derrickstolee left a comment

Choose a reason for hiding this comment

Uh oh!

jeffhostetler commented Oct 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeffhostetler commented Oct 15, 2019 •

edited

Loading

wilbaker commented Oct 16, 2019 •

edited

Loading