Skip to content

Conversation

@zhaohuabing
Copy link
Member

@zhaohuabing zhaohuabing commented Oct 31, 2025

The Gateway API translator calls the issuer's well-known OIDC configuration endpoint to fetch OIDC configuration for each routes. This can cause significant delay during translation when the issuer's well-known endpoint is slow or unresponsive.

This PR improves it by caching the fetching results and reuse them during the translation.

fixes: #7358

The PR has been verified with the following setup.

Test setup:

Crate a SecurityPolicy targeting 10 HTTPRoutes.

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: SecurityPolicy
metadata:
  name: keycloak-oidc-not-exist
spec:
  oidc:
    clientID: oidctest
    clientSecret:
      group: ""
      kind: Secret
      name: oidctest-secret
    cookieNames:
      accessToken: OIDC_AccessToken
      idToken: OIDC_IdToken
    forwardAccessToken: false
    logoutPath: /foo/logout
    provider:
      issuer: https://keycloak-not-exist.default/realms/master
    redirectURL: https://www.example.com/foo/oauth2/callback
    refreshToken: true
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: foo1
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: foo2
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: foo3
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: foo4
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: foo5
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: foo6
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: foo7
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: foo8
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: foo9
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: foo10

Scale out the backend deploy from 1 to 20.

kubectl scale deployment backend --replicas=20

v1.5.4 test result

It took 279s for v1.5.4 to sync the endpoints to envoy.

ADDRESS       ENVOY_HEALTH_STATUS    READY    SERVING    TERMINATING    EG_READY    EG_SERVING    EG_TERMINATING    RED_DURATION
------------  ---------------------  -------  ---------  -------------  ----------  ------------  ----------------  --------------
10.244.0.200  UNHEALTHY              -        -          -              -           -             -                 -
10.244.0.201  UNHEALTHY              -        -          -              -           -             -                 -
10.244.0.202  UNHEALTHY              -        -          -              -           -             -                 -
10.244.0.204  UNHEALTHY              -        -          -              -           -             -                 -
10.244.0.207  UNHEALTHY              -        -          -              -           -             -                 -
10.244.0.212  UNHEALTHY              -        -          -              -           -             -                 -
10.244.0.213  UNHEALTHY              -        -          -              -           -             -                 -
10.244.0.215  UNHEALTHY              -        -          -              -           -             -                 -
10.244.0.216  UNHEALTHY              -        -          -              -           -             -                 -
10.244.0.239  -                      true     true       false          true        true          false             279s
10.244.0.240  -                      true     true       false          true        true          false             279s
10.244.0.241  -                      true     true       false          true        true          false             279s
10.244.0.242  -                      true     true       false          true        true          false             279s
10.244.0.243  -                      true     true       false          true        true          false             279s
10.244.0.244  -                      true     true       false          true        true          false             279s
10.244.0.245  -                      true     true       false          true        true          false             279s
10.244.0.246  -                      true     true       false          true        true          false             279s
10.244.0.247  -                      true     true       false          true        true          false             279s
10.244.0.248  -                      true     true       false          true        true          false             279s
10.244.0.249  -                      true     true       false          true        true          false             279s
10.244.0.250  -                      true     true       false          true        true          false             279s
10.244.0.251  -                      true     true       false          true        true          false             279s
10.244.0.252  -                      true     true       false          true        true          false             279s
10.244.0.253  -                      true     true       false          true        true          false             279s
10.244.0.254  -                      true     true       false          true        true          false             279s
10.244.0.6    -                      true     true       false          true        true          false             279s
10.244.0.7    -                      true     true       false          true        true          false             279s
10.244.0.8    HEALTHY                true     true       false          true        true          false             -
10.244.0.9    -                      true     true       false          true        true          false             279s

With the coalesce optimization in #7328

With PR #7328 alone, the sync time was reduced to 58s.

2025-10-31T13:20:35.720Z INFO watchable message/watchutil.go:132 coalesced updates {"runner": "gateway-api", "count": 1, "before": 19}

ADDRESS      ENVOY_HEALTH_STATUS    READY    SERVING    TERMINATING    EG_READY    EG_SERVING    EG_TERMINATING    RED_DURATION
-----------  ---------------------  -------  ---------  -------------  ----------  ------------  ----------------  --------------
10.244.0.32  -                      true     true       false          true        true          false             58s
10.244.0.33  -                      true     true       false          true        true          false             58s
10.244.0.34  -                      true     true       false          true        true          false             58s
10.244.0.35  -                      true     true       false          true        true          false             58s
10.244.0.36  -                      true     true       false          true        true          false             58s
10.244.0.37  -                      true     true       false          true        true          false             58s
10.244.0.38  -                      true     true       false          true        true          false             58s
10.244.0.39  -                      true     true       false          true        true          false             58s
10.244.0.40  -                      true     true       false          true        true          false             58s
10.244.0.41  -                      true     true       false          true        true          false             58s
10.244.0.42  -                      true     true       false          true        true          false             58s
10.244.0.43  -                      true     true       false          true        true          false             58s
10.244.0.44  -                      true     true       false          true        true          false             58s
10.244.0.45  -                      true     true       false          true        true          false             58s
10.244.0.46  -                      true     true       false          true        true          false             58s
10.244.0.47  -                      true     true       false          true        true          false             58s
10.244.0.48  -                      true     true       false          true        true          false             58s
10.244.0.49  -                      true     true       false          true        true          false             58s
10.244.0.50  -                      true     true       false          true        true          false             58s

With both #7328 and this PR

With PR #7328 and this PR, the sync time was reduced to 9s.

2025-10-31T13:15:47.903Z INFO watchable message/watchutil.go:132 coalesced updates {"runner": "gateway-api", "count": 1, "before": 18}

ADDRESS      ENVOY_HEALTH_STATUS    READY    SERVING    TERMINATING    EG_READY    EG_SERVING    EG_TERMINATING    RED_DURATION
-----------  ---------------------  -------  ---------  -------------  ----------  ------------  ----------------  --------------
10.244.0.12  -                      true     true       false          true        true          false             9s
10.244.0.13  HEALTHY                true     true       false          true        true          false             -
10.244.0.14  -                      true     true       false          true        true          false             9s
10.244.0.15  -                      true     true       false          true        true          false             9s
10.244.0.16  -                      true     true       false          true        true          false             9s
10.244.0.17  -                      true     true       false          true        true          false             9s
10.244.0.18  -                      true     true       false          true        true          false             9s
10.244.0.19  -                      true     true       false          true        true          false             9s
10.244.0.20  -                      true     true       false          true        true          false             9s
10.244.0.21  -                      true     true       false          true        true          false             9s
10.244.0.22  -                      true     true       false          true        true          false             9s
10.244.0.23  -                      true     true       false          true        true          false             9s
10.244.0.24  -                      true     true       false          true        true          false             9s
10.244.0.25  -                      true     true       false          true        true          false             9s
10.244.0.26  -                      true     true       false          true        true          false             9s
10.244.0.27  -                      true     true       false          true        true          false             9s
10.244.0.28  -                      true     true       false          true        true          false             9s
10.244.0.29  HEALTHY                true     true       false          true        true          false             -
10.244.0.30  -                      true     true       false          true        true          false             9s
10.244.0.8   HEALTHY                true     true       false          true        true          false             -

@zhaohuabing zhaohuabing requested a review from a team as a code owner October 31, 2025 08:17
@zhaohuabing zhaohuabing force-pushed the improve-oidc-auto-discovery branch 3 times, most recently from 6766744 to 18b2baa Compare October 31, 2025 08:26
@zhaohuabing zhaohuabing marked this pull request as draft October 31, 2025 08:34
@zhaohuabing zhaohuabing force-pushed the improve-oidc-auto-discovery branch 2 times, most recently from 88d6511 to 50699dc Compare October 31, 2025 08:59
@zhaohuabing zhaohuabing marked this pull request as ready for review October 31, 2025 09:04
@codecov
Copy link

codecov bot commented Oct 31, 2025

Codecov Report

❌ Patch coverage is 87.09677% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@5a95a04). Learn more about missing BASE report.
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
internal/gatewayapi/securitypolicy.go 87.09% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7394   +/-   ##
=======================================
  Coverage        ?   72.35%           
=======================================
  Files           ?      231           
  Lines           ?    34034           
  Branches        ?        0           
=======================================
  Hits            ?    24626           
  Misses          ?     7634           
  Partials        ?     1774           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


// Parse the OpenID configuration response
var config OpenIDConfig
if err = backoff.Retry(func() error {
Copy link
Member Author

@zhaohuabing zhaohuabing Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of blocking the translator here, a more ideal approach is to fail fast and retry fetching in a background go routine, and re-trigger the translation once it succeed. This would need a global cache and some hack in the message watch.

If this makes sense, I'll send a follow-up PR.

@zhaohuabing zhaohuabing added this to the v1.6.0 Milestone milestone Oct 31, 2025
jukie
jukie previously approved these changes Oct 31, 2025
@arkodg
Copy link
Contributor

arkodg commented Oct 31, 2025

thanks @zhaohuabing, guessing we'll hit this issue for jwt and wasm too, any other remote configuration we rely on ?

@zhaohuabing
Copy link
Member Author

zhaohuabing commented Nov 1, 2025

thanks @zhaohuabing, guessing we'll hit this issue for jwt and wasm too, any other remote configuration we rely on ?

jwt: we don't pull the jwks on the control plane.
wasm: we have a global cache for wasm module, but now it doesn't cache failed pulls, and retries for failed wasm on every route. So we do have similar issue there.

Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

@arkodg arkodg merged commit 2ec695d into envoyproxy:main Nov 4, 2025
30 of 32 checks passed
@zhaohuabing zhaohuabing deleted the improve-oidc-auto-discovery branch November 5, 2025 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Delayed endpoint updates for specific HTTPRoute causing 503 errors

4 participants