Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions keps/prod-readiness/sig-autoscaling/4951.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
kep-number: 4951
alpha:
approver: "@deads2k"
52 changes: 34 additions & 18 deletions keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,9 +102,9 @@ checklist items _must_ be updated for the enhancement to be released.

Items marked with (R) are required *prior to targeting to a milestone / release*.

- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
- [ ] (R) Design details are appropriately documented
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [x] (R) KEP approvers have approved the KEP status as `implementable`
- [x] (R) Design details are appropriately documented
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- [ ] e2e Tests for all Beta API Operations (endpoints)
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
Expand Down Expand Up @@ -283,7 +283,7 @@ when drafting this test plan.
[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
-->

[ ] I/we understand the owners of the involved components may require updates to
[x] I/we understand the owners of the involved components may require updates to
existing tests to make this code solid enough prior to committing the changes necessary
to implement this enhancement.

Expand Down Expand Up @@ -335,7 +335,7 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
https://storage.googleapis.com/k8s-triage/index.html
-->

- <test>: <link to test coverage>
N/A, the feature is tested using unit tests and e2e tests.

##### e2e tests

Expand Down Expand Up @@ -538,6 +538,9 @@ You can take a look at one potential example of such test in:
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
-->

We will add a unit test verifying that HPAs with and without the new fields are
properly validated, both when the feature gate is enabled or not.

### Rollout, Upgrade and Rollback Planning

<!--
Expand Down Expand Up @@ -594,6 +597,9 @@ checking if there are objects with field X set) may be a last resort. Avoid
logs or events for this purpose.
-->

The presence of the new `tolerance` HPA field indicates that the feature is
used.

###### How can someone using this feature know that it is working for their instance?

<!--
Expand All @@ -605,13 +611,10 @@ and operation of this feature.
Recall that end users cannot usually observe component logs or access metrics.
-->

- [ ] Events
- Event Reason:
- [ ] API .status
- Condition name:
- Other field:
- [ ] Other (treat as last resort)
- Details:
- [X] Events
- Event Reason: `SuccessfulRescale`

Users can monitor the scaling behavior of their HPA.

###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?

Expand All @@ -630,18 +633,21 @@ These goals will help you determine what you need to measure (SLIs) in the next
question.
-->

N/A.

###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

<!--
Pick one more of these and delete the rest.
-->

- [ ] Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- [ ] Other (treat as last resort)
- Details:
This KEP is not expected to have any impact on SLIs/SLOs as it doesn't introduce
a new HPA behavior, but merely allows users to easily change the value of a
parameter that's otherwise difficult to update.

Standard HPA metrics (e.g.
`horizontal_pod_autoscaler_controller_metric_computation_duration_seconds`) can
be used to verify the HPA controller health.

###### Are there any missing metrics that would be useful to have to improve observability of this feature?

Expand All @@ -650,6 +656,12 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
implementation difficulties, etc.).
-->

Users may want to see a signal that autoscaling isn't happening because of the
tolerance, but this is not directly related to this KEP (this problem already
exists today with the hard-coded 10% tolerance), and taking this KEP as an
opportunity to improve the situation is difficult (see
[this thread](https://github.com/kubernetes/enhancements/pull/4954#discussion_r1857098884)).

### Dependencies

<!--
Expand Down Expand Up @@ -775,6 +787,8 @@ Are there any tests that were run/should be run to understand performance charac
and validate the declared limits?
-->

No.

### Troubleshooting

<!--
Expand Down Expand Up @@ -820,6 +834,8 @@ Major milestones might include:
- when the KEP was retired or superseded
-->

2025-01-21: KEP PR merged.

## Drawbacks

<!--
Expand Down
4 changes: 2 additions & 2 deletions keps/sig-autoscaling/4951-configurable-hpa-tolerance/kep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ authors:
- "@pr00se"
- "@jm-franc"
owning-sig: sig-autoscaling
status: provisional
status: implementable
creation-date: 2024-11-05
reviewers:
- "@gjtempleton"
- "@raywainman"
approvers:
- TBD
- "@gjtempleton"

see-also:
- "/keps/sig-autoscaling/853-configurable-hpa-scale-velocity"
Expand Down