Skip to content

Commit 77421cd

Browse files
byakowindsonseaelezarTim Bannistereero-t
committed
Add Resource Driver concept
Co-authored-by: Michael <[email protected]> Co-authored-by: Evan Lezar <[email protected]> Co-authored-by: Tim Bannister <[email protected]> Co-authored-by: Eero Tamminen <[email protected]> Co-authored-by: Patrick Ohly <[email protected]> Co-authored-by: Dipesh Rawat <[email protected]>
1 parent a1b2385 commit 77421cd

File tree

12 files changed

+467
-2
lines changed

12 files changed

+467
-2
lines changed

content/en/docs/concepts/extend-kubernetes/compute-storage-net/_index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,9 @@ fabric that links Pods together.
4242
Kubernetes {{< skew currentVersion >}} is compatible with {{< glossary_tooltip text="CNI" term_id="cni" >}}
4343
network plugins.
4444

45+
* [Resource drivers](/docs/concepts/extend-kubernetes/compute-storage-net/resource-drivers/)
46+
47+
Resource drivers allow custom allocation logic for non-native cluster resources that are
48+
difficult to represent with scalar values. They offload from the scheduler the burden of
49+
understanding these resources and planning their usage through ResourceClaims by Pods.
50+
Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
---
2+
title: DRA Resource Drivers
3+
description: Resource drivers provide non-trivial allocation logic and management for devices or resources that require vendor-specific or just complex setup, such as GPUs, NICs, FPGAs, etc.
4+
content_type: concept
5+
weight: 10
6+
---
7+
8+
<!-- overview -->
9+
{{< feature-state for_k8s_version="v1.27" state="alpha" >}}
10+
11+
Kubernetes provides a
12+
[Dynamic Resource Allocation](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/) (DRA)
13+
mechanism
14+
that can be leveraged to provide more complex hardware resources to workloads with custom resource
15+
accounting.
16+
17+
Similarly to {{< glossary_tooltip term_id="device-plugin" text="device plugins">}}, instead of
18+
customizing the code for Kubernetes itself, vendors can implement a _resource driver_ that you deploy
19+
into the cluster to account for and control the allocation of GPUs, high-performance NICs, FPGAs,
20+
InfiniBand adapters, and other similar computing resources that may require vendor specific
21+
initialization and setup.
22+
23+
With device plugins, the scheduler was given a trivial, numerical representation of the resources
24+
available on a node for consideration during scheduling as an extended resource.
25+
26+
With DRA, the scheduler is offloading the task of allocating and accounting for non-native resources
27+
to the resource driver, which manages such resources in the cluster.
28+
29+
A resource driver consists of two main components:
30+
31+
- a _controller_ (one per cluster), manages hardware resources allocation for
32+
{{< glossary_tooltip term_id="resource-claim" text="ResourceClaims">}}
33+
- _kubelet plugin_ (one per node that has or can access the associated resource), that:
34+
- discovers the supported hardware
35+
- announces the discovered hardware to the resource driver controller
36+
- prepares the hardware allocated to a ResourceClaim when the Kubelet prepares to create the Pod
37+
- unprepares the hardware allocated for a Pod when the Pod has reached final state or is being deleted.
38+
39+
There are two common ways to implement communication between the controller and a kubelet plugin:
40+
41+
- through custom resource objects that use
42+
{{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinitions">}}
43+
provided by the vendor or project behind the resource driver
44+
- through a ResourceHandle which is a part of an `AllocationResult` provided by the controller in case of
45+
successful allocation
46+
47+
General recommendations:
48+
49+
- resource driver name pattern: `<HW type>.resource.<companyname>.<companydomain>`. For example,
50+
gpu.resource.example.com
51+
52+
<!-- body -->
53+
54+
## Implementing controller
55+
56+
Communication between the scheduler and a resource driver is done through
57+
{{< glossary_tooltip term_id="pod-scheduling-context" text="PodSchedulingContext">}} API object. It is
58+
recommended to use [controller helper library](/docs/reference/helper-libraries/dra-driver-controller/),
59+
it implements all needed operations related to PodSchedulingContext, they are common for all resource
60+
drivers.
61+
62+
When using the DRA helper code, resource driver controller has to implement a
63+
[driver interface](https://pkg.go.dev/k8s.io/[email protected]/controller#Driver),
64+
which then can be simply used with
65+
[controller helper instance](https://pkg.go.dev/k8s.io/[email protected]/controller#New).
66+
67+
See the [example resource driver controller](https://github.com/kubernetes-sigs/dra-example-driver/blob/151b7c8da2e620c47a3591e1a937f8d0297b0c25/cmd/dra-example-controller/main.go#L208) for details.
68+
69+
A resource driver controller's main responsibility is to allocate and deallocate resources for
70+
{{< glossary_tooltip term_id="resource-claim" text="ResourceClaims">}}.
71+
72+
There are two modes of allocation the ResourceClaim can have:
73+
74+
- `WaitForFirstConsumer` (default), which you could think of as meaning _delayed_.
75+
In this mode the cluster only requests resource(s)
76+
for ResourceClaim when a Pod that needs it is being scheduled.
77+
- `Immediate`: the resource has to be allocated to the ResourceClaim as soon as possible, and
78+
retained until the ResourceClaim is deleted.
79+
80+
If `allocationMode` is not set explicitly, the mode is `WaitForFirstConsumer`.
81+
82+
### Delayed allocation
83+
84+
Controller helper code will first call _UnsuitableNodes_ for driver to report which of candidate Nodes
85+
chosen by the scheduler are not suitable for allocating all needed ResourceClaims of this resource driver.
86+
If no nodes were suitable, scheduler selects another batch of Node names, and _UnsuitableNodes_ is
87+
called again until suitable node is found.
88+
89+
When at least one Node is found to be suitable for all ResourceClaims, the scheduler
90+
considers the suitable nodes for the other Pod scheduling constraints (native resources
91+
requests, affinity, selectors, etc.), and picks up exactly one Node name.
92+
93+
After the Node was selected, the controller helper code will invoke _Allocate_ call of the Driver
94+
to do the actual resource allocation for needed ResourceClaims on selected Node.
95+
96+
If Allocate call returns error for any number of ResourceClaims, the helper code will repeat the
97+
same call with interval until it succeeds.
98+
99+
### Immediate allocation
100+
101+
Immediate allocation does not have selected node, and it is up to the resource driver controller
102+
to select the best suitable node based on the ResourceClaim, ResourceClass and their parameters.
103+
Therefore in this scenario only `Allocate` is called by the helper library, without `UnsupportedNodes`
104+
being called first.
105+
106+
### Common calls for both allocation modes
107+
108+
In both modes the allocation is preceded by getting parameters objects for ResourceClaims and
109+
ResourceClasses to ensure the resource driver is able to get these objects and understand them.
110+
111+
## Sharing resources
112+
113+
There are two main ways of sharing resources between Pods:
114+
- by using the same ResourceClaim in multiple Pods
115+
- by using the same underlying resource for different ResourceClaims
116+
117+
### Shared ResourceClaims
118+
119+
If the `Shareable` field is set to `true` in AllocationResult for ResourceClaim, scheduler will
120+
allow the same ResourceClaim to be used by up to 32 Pods by automatically updating
121+
`Claim.Status.ReservedFor` field without consulting the resource driver that allocated resource
122+
for this ResourceClaim.
123+
124+
### Internal accounting in resource driver
125+
126+
The other way of sharing same resource is by implementing the sharing logic in the resource driver.
127+
This can be based on, for instance, ResourceClass parameters field that would specify whether the
128+
resource driver should exclusively allocate the resource to the ResourceClaim, or same resource
129+
can be allocated to other ResourceClaims.
130+
131+
## Implementing kubelet plugin
132+
133+
It is recommended to use
134+
[kubelet plugin helper library](/docs/reference/helper-libraries/dra-driver-kubelet-plugin/).
135+
136+
Resource driver's kubelet plugin's main purpose is to ensure the ResourceClaims, that the Pod will
137+
be using on Node, have all resources ready for Pod to take in use.
138+
139+
### Example {#example-pod}
140+
141+
Suppose a Kubernetes cluster is running a resource driver gpu.resource.example.com with Resource
142+
Class `example.example.com`. Here is an example of a pod requesting this resource to run a demo
143+
workload:
144+
145+
```yaml
146+
# gpu.resource.example.com GpuClaimParameters is an example extension API for parameters
147+
apiVersion: gpu.resource.example.com/v1alpha1
148+
kind: GpuClaimParameters
149+
metadata:
150+
name: single-gpu
151+
spec:
152+
count: 1
153+
---
154+
apiVersion: resource.k8s.io/v1alpha2
155+
kind: ResourceClaim
156+
metadata:
157+
name: gpu-test
158+
spec:
159+
resourceClassName: gpu.example.com
160+
parametersRef:
161+
apiGroup: gpu.resource.example.com/v1alpha1
162+
kind: GpuClaimParameters
163+
name: single-gpu
164+
---
165+
apiVersion: v1
166+
kind: Pod
167+
metadata:
168+
namespace: gpu-test4
169+
name: pod0
170+
labels:
171+
app: pod
172+
spec:
173+
containers:
174+
- name: container1
175+
image: ubuntu:22.04
176+
command: ["bash", "-c"]
177+
args: ["export; sleep 9999"]
178+
resources:
179+
claims:
180+
- name: gpus
181+
resourceClaims:
182+
- name: gpus
183+
source:
184+
resourceClaimTemplateName: gpu-test
185+
# This Pod wants to use ResourceClaim gpu-test that needs 1 device of ResourceClass
186+
# gpu.example.com, handled by the gpu.resource.example.com resource driver.
187+
#
188+
# The resource driver allocates the resources required for that ResourceClaim and ensures that these are
189+
# ready to use, only then the Pod will start.
190+
```
191+
192+
## Good practice for resource driver deployment {#resource-driver-deploy-tips}
193+
194+
The recommended way to deploy a resource driver is a Deployment for controller part and a DaemonSet
195+
for the kubelet plugin part. It is also possible to deploy it as a package for your node's
196+
operating system, or manually.
197+
198+
The kubelet uses a gRPC interface to interact with a resource driver's kubelet plugin. On the Kubernetes side,
199+
no special permissions are required for resource drivers.
200+
201+
When you deploy a resource driver, you typically also define at least one ResourceClass using that driver.
202+
203+
## API compatibility
204+
205+
Kubernetes Dynamic Resource Allocation support is in alpha. The API may change before stabilization,
206+
in incompatible ways. As a project, Kubernetes recommends that resource driver developers:
207+
208+
* Watch for changes in future releases.
209+
* Support multiple versions of the resource driver API for backward/forward compatibility.
210+
211+
If you enable the `DynamicResourceAllocation` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) and run associated kubelet plugins on nodes
212+
that need to be upgraded to a Kubernetes release with a newer DRA API version, upgrade your
213+
resource drivers to support both versions before upgrading these nodes. Taking that approach will
214+
ensure the continuous functioning of the device allocations during the upgrade.
215+
216+
## DRA resource driver examples {#examples}
217+
218+
{{% thirdparty-content %}}
219+
220+
Here are some examples of resource driver implementations:
221+
222+
* The [example resource driver](https://github.com/kubernetes-sigs/dra-example-driver)
223+
* The [Intel GPU resource driver](https://github.com/intel/intel-resource-drivers-for-kubernetes)
224+
* The [NVIDIA GPU resource driver](https://github.com/NVIDIA/k8s-dra-driver)
225+
226+
227+
## {{% heading "whatsnext" %}}
228+
229+
* Learn about [creating your own DRA resource driver](https://www.youtube.com/watch?v=_fi9asserLE)
230+
* Discover the [example DRA resource driver](https://github.com/kubernetes-sigs/dra-example-driver)

content/en/docs/reference/_index.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ client libraries:
5959
a set of back-ends.
6060
* [kube-scheduler](/docs/reference/command-line-tools-reference/kube-scheduler/) -
6161
Scheduler that manages availability, performance, and capacity.
62-
62+
6363
* [Scheduler Policies](/docs/reference/scheduling/policies)
6464
* [Scheduler Profiles](/docs/reference/scheduling/config#profiles)
6565

@@ -92,7 +92,7 @@ operator to use or manage a cluster.
9292
* [kube-controller-manager configuration (v1alpha1)](/docs/reference/config-api/kube-controller-manager-config.v1alpha1/)
9393
* [kube-proxy configuration (v1alpha1)](/docs/reference/config-api/kube-proxy-config.v1alpha1/)
9494
* [`audit.k8s.io/v1` API](/docs/reference/config-api/apiserver-audit.v1/)
95-
* [Client authentication API (v1beta1)](/docs/reference/config-api/client-authentication.v1beta1/) and
95+
* [Client authentication API (v1beta1)](/docs/reference/config-api/client-authentication.v1beta1/) and
9696
[Client authentication API (v1)](/docs/reference/config-api/client-authentication.v1/)
9797
* [WebhookAdmission configuration (v1)](/docs/reference/config-api/apiserver-webhookadmission.v1/)
9898
* [ImagePolicy API (v1alpha1)](/docs/reference/config-api/imagepolicy.v1alpha1/)
@@ -117,3 +117,9 @@ An archive of the design docs for Kubernetes functionality. Good starting points
117117
[Kubernetes Architecture](https://git.k8s.io/design-proposals-archive/architecture/architecture.md) and
118118
[Kubernetes Design Overview](https://git.k8s.io/design-proposals-archive).
119119

120+
## Helper libraries
121+
122+
### Dynamic resource allocation
123+
124+
[Resource driver controller](/docs/reference/helper-libraries/dra-driver-controller/)
125+
[Resource driver kubelet plugin](/docs/reference/helper-libraries/dra-driver-kubelet-plugin/)
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
id: pod-scheduling-context
3+
title: Pod Scheduling Context
4+
full-link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/
5+
date: 2023-11-28
6+
short_description: >
7+
A short-lived object that is created by the kube-scheduler to coordinate with resource drivers the
8+
selection of a Node for the Pod, that uses one or more ResourceClaims.
9+
10+
related:
11+
- kube-scheduler
12+
- resource-claim
13+
- resource-driver
14+
---
15+
16+
A [Pod Scheduling Context](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/) is used
17+
by kube-scheduler when Pod needs ResourceClaims in order to be scheduled.
18+
19+
<!--more-->
20+
21+
Resource drivers and kube-scheduler communicate through RecourceClaim and PodSchedulingContext objects
22+
during scheduling. The Pod only gets a Node name assigned when all the ResourceClaims listed in
23+
PodSchedulingContext are in status `Allocated` and are `ReservedFor` for the Pod that is being scheduled.
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
title: Resource Claim Parameters
3+
id: resource-claim-parameters
4+
date: 2023-10-16
5+
full_link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/
6+
short_description: >
7+
Specification of what and how much of resources the Resource Claim needs.
8+
aka:
9+
tags:
10+
- extension
11+
---
12+
{{< glossary_tooltip term_id="resource-driver" text="Resource Driver">}}-specific object, subject
13+
to vendor implementation. Optional. Typically contains quantity and characteristics of the requested
14+
resources.
15+
16+
<!--more-->
17+
18+
Not part of core Kubernetes. Referenced in `ParametersRef` field of
19+
{{< glossary_tooltip term_id="resource-claim" text="Resource Claim">}}.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
title: Resource Claim
3+
id: resource-claim
4+
date: 2023-10-16
5+
full_link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/
6+
short_description: >
7+
Defines what kind of resource is needed and what the parameters for it are.
8+
aka:
9+
tags:
10+
- core-object
11+
- fundamental
12+
---
13+
Additional parameters are provided by a cluster admin in
14+
{{< glossary_tooltip text="Resource Class" term_id="resource-class" >}}.
15+
16+
<!--more-->
17+
18+
Can reference
19+
{{< glossary_tooltip term_id="resource-claim-parameters" text="Resource Claim Parameters">}}
20+
with {{< glossary_tooltip term_id="resource-driver" text="Resource Driver">}}-specific details.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
title: Resource Class Parameters
3+
id: resource-class-parameters
4+
date: 2023-10-16
5+
full_link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/
6+
short_description: >
7+
Details for Resource Driver on how to allocate resources.
8+
aka:
9+
tags:
10+
- extension
11+
---
12+
{{< glossary_tooltip term_id="resource-driver" text="Resource Driver">}}-specific object that,
13+
when referenced in {{< glossary_tooltip term_id="resource-class" text="Resource Class">}}, provides
14+
details about how to allocate resources.
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
title: Resource Class
3+
id: resource-class
4+
date: 2023-10-16
5+
full_link: /docs/concepts/scheduling-eviction/dynamic-resource-allocation/
6+
short_description: >
7+
Describes the type of resources the Resource Driver can allocate.
8+
aka:
9+
tags:
10+
- core-object
11+
- fundamental
12+
---
13+
Abstract object that links {{< glossary_tooltip term_id="resource-claim" text="Resource Claims">}}
14+
and {{< glossary_tooltip term_id="resource-driver" text="Resource Drivers">}}.
15+
16+
<!--more-->
17+
18+
When Resource Claim needs resources allocation, its `resourceClassName` field indicates which
19+
Resource Class will be used to initiate allocation. Resource Class contains the name of the driver,
20+
that will perform the allocation, in `driverName` field, and optionally
21+
{{< glossary_tooltip term_id="resource-class-parameters" text="Resource Class Parameters">}}
22+
reference to provide Resource Driver with further allocation process customization.
23+
24+
Same Resource Driver can be referenced in many Resource Classes, typically in such case, Resource
25+
Classes have different
26+
{{< glossary_tooltip term_id="resource-class-parameters" text="Resource Class Parameters">}}
27+
telling driver to do the allocation differently for each of them. For instance, one class can be
28+
used to allocate shared resources, another - to allocate resources exclusively.
29+
30+
Typically managed by the cluster admin.

0 commit comments

Comments
 (0)