Skip to content

Conversation

@nojnhuh
Copy link
Contributor

@nojnhuh nojnhuh commented Oct 14, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This PR adds new internal NodeInfo and PodInfo types independent from the ones from k8s.io/kubernetes/pkg/scheduler. The new types will eventually be used to store information about ResourceSlices and ResourceClaims for Dynamic Resource Allocation (DRA).

Which issue(s) this PR fixes:

Part of kubernetes/kubernetes#118612

Special notes for your reviewer:

These changes are based on the first commits of @towca's #7350 up to and including bb87555. I've made some small changes to avoid having to import the k8s.io/api/resource/v1alpha3 package yet which doesn't currently exist for the version of k8s.io in CA's go.mod.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. area/cluster-autoscaler area/provider/alicloud Issues or PRs related to the AliCloud cloud provider implementation area/provider/aws Issues or PRs related to aws provider labels Oct 14, 2024
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider area/provider/cluster-api Issues or PRs related to Cluster API provider area/provider/digitalocean Issues or PRs related to digitalocean provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/provider/equinixmetal Issues or PRs related to the Equinix Metal cloud provider for Cluster Autoscaler area/provider/externalgrpc Issues or PRs related to the External gRPC provider area/provider/gce area/provider/hetzner Issues or PRs related to Hetzner provider area/provider/ionoscloud area/provider/kwok Issues or PRs related to the kwok cloud provider for Cluster Autoscaler area/provider/linode Issues or PRs related to linode provider area/provider/magnum Issues or PRs related to the Magnum cloud provider for Cluster Autoscaler area/provider/oci Issues or PRs related to oci provider size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/provider/rancher labels Oct 14, 2024
@nojnhuh nojnhuh changed the title Internal node info Create and use new internal NodeInfo and PodInfo types to enable tracking DRA resources Oct 14, 2024
@nojnhuh nojnhuh force-pushed the internal-node-info branch 2 times, most recently from 466082f to 98b158d Compare October 14, 2024 21:46
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Oct 14, 2024

/assign @MaciekPytel
/cc @jackfrancis

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 16, 2024
@nojnhuh nojnhuh force-pushed the internal-node-info branch from 98b158d to 9a42f0d Compare October 16, 2024 19:24
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 16, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nojnhuh
Once this PR has been reviewed and has the lgtm label, please ask for approval from maciekpytel. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 23, 2024
towca added 8 commits October 23, 2024 15:30
…eChecker

This allows other components to interact with the Framework, which will
be needed for DRA support later.
Methods to interact with the new internal types are added to
ClusterSnapshot. Cluster Autoscaler code will be migrated to only
use these methods and work on the internal types instead of directly
using the framework types.

The new types are designed so that they can be used exactly like the
framework types, which should make the migration manageable.

This allows easily adding additional data to the Nodes and Pods tracked
in ClusterSnapshot, without having to change the scheduler framework.
This will be needed to support DRA, as we'll need to track
ResourceSlices and ResourceClaims.
The new types should behave like the direct schedulerframework
types for most purposes, so most of the migration is just changing
the imported package.

Constructors look a bit different, so they have to be adapted -
mostly in test code.
…ddNodeInfo

We need AddNodeInfo in order to propagate DRA objects through the
snapshot, which makes AddNodeWithPods redundant.
AddNodes() is redundant - it was indended for batch adding nodes,
with batch-specific optimizations in mind probably. However, it
has always been implemented as just iterating over AddNode(), and
is only used in test code.

Most of the uses in the test code were initialization - they are
replaced with Initialize(), which will later be needed for handling
DRA anyway. The other uses are replaced with inline loops over
AddNode().
The method is already accessible via StorageInfos(), it's
redundant.
AddNodeInfo already provides the same functionality, and has to be used
in production code in order to propagate DRA objects correctly.

Uses in production are replaced with Initialize(), which will later
take DRA objects into account. Uses in the test code are replaced with
AddNodeInfo().
simulator.BuildNodeInfoForNode, core_utils.GetNodeInfoFromTemplate,
and scheduler_utils.DeepCopyTemplateNode all had very similar logic
for sanitizing and copying NodeInfos. They're all consolidated to
one file in simulator, sharing common logic.

MixedTemplateNodeInfoProvider now correctly uses ClusterSnapshot to
correlate Nodes to scheduled pods, instead of using a live Pod lister.
This means that the snapshot now has to be properly initialized in a
bunch of tests.
@nojnhuh nojnhuh force-pushed the internal-node-info branch from 9a42f0d to ce9e3fe Compare October 23, 2024 20:46
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 23, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Oct 30, 2024

Closing in favor of #7447

/close

@k8s-ci-robot
Copy link
Contributor

@nojnhuh: Closed this PR.

In response to this:

Closing in favor of #7447

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cluster-autoscaler area/provider/alicloud Issues or PRs related to the AliCloud cloud provider implementation area/provider/aws Issues or PRs related to aws provider area/provider/azure Issues or PRs related to azure provider area/provider/cluster-api Issues or PRs related to Cluster API provider area/provider/digitalocean Issues or PRs related to digitalocean provider area/provider/equinixmetal Issues or PRs related to the Equinix Metal cloud provider for Cluster Autoscaler area/provider/externalgrpc Issues or PRs related to the External gRPC provider area/provider/gce area/provider/hetzner Issues or PRs related to Hetzner provider area/provider/ionoscloud area/provider/kwok Issues or PRs related to the kwok cloud provider for Cluster Autoscaler area/provider/linode Issues or PRs related to linode provider area/provider/magnum Issues or PRs related to the Magnum cloud provider for Cluster Autoscaler area/provider/oci Issues or PRs related to oci provider area/provider/rancher cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants