|
| 1 | +--- |
| 2 | +title: DRA Resource Drivers |
| 3 | +description: Resource drivers provide non-trivial allocation logic and management for devices or resources that require vendor-specific or just complex setup, such as GPUs, NICs, FPGAs, etc. |
| 4 | +content_type: concept |
| 5 | +weight: 10 |
| 6 | +--- |
| 7 | + |
| 8 | +<!-- overview --> |
| 9 | +{{< feature-state for_k8s_version="v1.27" state="alpha" >}} |
| 10 | + |
| 11 | +Kubernetes provides a |
| 12 | +[Dynamic Resource Allocation](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/) (DRA) |
| 13 | +mechanism |
| 14 | +that can be leveraged to provide more complex hardware resources to workloads with custom resource |
| 15 | +accounting. |
| 16 | + |
| 17 | +Similarly to {{< glossary_tooltip term_id="device-plugin" text="device plugins">}}, instead of |
| 18 | +customizing the code for Kubernetes itself, vendors can implement a _resource driver_ that you deploy |
| 19 | +into the cluster to account for and control the allocation of GPUs, high-performance NICs, FPGAs, |
| 20 | +InfiniBand adapters, and other similar computing resources that may require vendor specific |
| 21 | +initialization and setup. |
| 22 | + |
| 23 | +With device plugins, the scheduler was given a trivial, numerical representation of the resources |
| 24 | +available on a node for consideration during scheduling as an extended resource. |
| 25 | + |
| 26 | +With DRA, the scheduler is offloading the task of allocating and accounting for non-native resources |
| 27 | +to the resource driver, which manages such resources in the cluster. |
| 28 | + |
| 29 | +A resource driver consists of two main components: |
| 30 | + |
| 31 | +- a _controller_ (one per cluster), manages hardware resources allocation for |
| 32 | + {{< glossary_tooltip term_id="resource-claim" text="ResourceClaims">}} |
| 33 | +- _kubelet plugin_ (one per node that has or can access the associated resource), that: |
| 34 | + - discovers the supported hardware |
| 35 | + - announces the discovered hardware to the resource driver controller |
| 36 | + - prepares the hardware allocated to a ResourceClaim when the Kubelet prepares to create the Pod |
| 37 | + - unprepares the hardware allocated for a Pod when the Pod has reached final state or is being deleted. |
| 38 | + |
| 39 | +There are two common ways to implement communication between the controller and a kubelet plugin: |
| 40 | + |
| 41 | +- through custom resource objects that use |
| 42 | + {{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinitions">}} |
| 43 | + provided by the vendor or project behind the resource driver |
| 44 | +- through a ResourceHandle which is a part of an `AllocationResult` provided by the controller in case of |
| 45 | + successful allocation |
| 46 | + |
| 47 | +General recommendations: |
| 48 | + |
| 49 | +- resource driver name pattern: `<HW type>.resource.<companyname>.<companydomain>`. For example, |
| 50 | + gpu.resource.example.com |
| 51 | + |
| 52 | +<!-- body --> |
| 53 | + |
| 54 | +## Implementing controller |
| 55 | + |
| 56 | +Communication between the scheduler and a resource driver is done through |
| 57 | +{{< glossary_tooltip term_id="pod-scheduling-context" text="PodSchedulingContext">}} API object. It is |
| 58 | +recommended to use [controller helper library](/docs/reference/helper-libraries/dra-driver-controller/), |
| 59 | +it implements all needed operations related to PodSchedulingContext, they are common for all resource |
| 60 | +drivers. |
| 61 | + |
| 62 | +When using the DRA helper code, resource driver controller has to implement a |
| 63 | +[driver interface ](https://pkg.go.dev/k8s.io/[email protected]/controller#Driver), |
| 64 | +which then can be simply used with |
| 65 | +[controller helper instance ](https://pkg.go.dev/k8s.io/[email protected]/controller#New). |
| 66 | + |
| 67 | +See the [example resource driver controller](https://github.com/kubernetes-sigs/dra-example-driver/blob/151b7c8da2e620c47a3591e1a937f8d0297b0c25/cmd/dra-example-controller/main.go#L208) for details. |
| 68 | + |
| 69 | +A resource driver controller's main responsibility is to allocate and deallocate resources for |
| 70 | +{{< glossary_tooltip term_id="resource-claim" text="ResourceClaims">}}. |
| 71 | + |
| 72 | +There are two modes of allocation the ResourceClaim can have: |
| 73 | + |
| 74 | +- `WaitForFirstConsumer` (default), which you could think of as meaning _delayed_. |
| 75 | + In this mode the cluster only requests resource(s) |
| 76 | + for ResourceClaim when a Pod that needs it is being scheduled. |
| 77 | +- `Immediate`: the resource has to be allocated to the ResourceClaim as soon as possible, and |
| 78 | + retained until the ResourceClaim is deleted. |
| 79 | + |
| 80 | +If `allocationMode` is not set explicitly, the mode is `WaitForFirstConsumer`. |
| 81 | + |
| 82 | +### Delayed allocation |
| 83 | + |
| 84 | +Controller helper code will first call _UnsuitableNodes_ for driver to report which of candidate Nodes |
| 85 | +chosen by the scheduler are not suitable for allocating all needed ResourceClaims of this resource driver. |
| 86 | +If no nodes were suitable, scheduler selects another batch of Node names, and _UnsuitableNodes_ is |
| 87 | +called again until suitable node is found. |
| 88 | + |
| 89 | +When at least one Node is found to be suitable for all ResourceClaims, the scheduler |
| 90 | +considers the suitable nodes for the other Pod scheduling constraints (native resources |
| 91 | +requests, affinity, selectors, etc.), and picks up exactly one Node name. |
| 92 | + |
| 93 | +After the Node was selected, the controller helper code will invoke _Allocate_ call of the Driver |
| 94 | +to do the actual resource allocation for needed ResourceClaims on selected Node. |
| 95 | + |
| 96 | +If Allocate call returns error for any number of ResourceClaims, the helper code will repeat the |
| 97 | +same call with interval until it succeeds. |
| 98 | + |
| 99 | +### Immediate allocation |
| 100 | + |
| 101 | +Immediate allocation does not have selected node, and it is up to the resource driver controller |
| 102 | +to select the best suitable node based on the ResourceClaim, ResourceClass and their parameters. |
| 103 | +Therefore in this scenario only `Allocate` is called by the helper library, without `UnsupportedNodes` |
| 104 | +being called first. |
| 105 | + |
| 106 | +### Common calls for both allocation modes |
| 107 | + |
| 108 | +In both modes the allocation is preceded by getting parameters objects for ResourceClaims and |
| 109 | +ResourceClasses to ensure the resource driver is able to get these objects and understand them. |
| 110 | + |
| 111 | +## Sharing resources |
| 112 | + |
| 113 | +There are two main ways of sharing resources between Pods: |
| 114 | +- by using the same ResourceClaim in multiple Pods |
| 115 | +- by using the same underlying resource for different ResourceClaims |
| 116 | + |
| 117 | +### Shared ResourceClaims |
| 118 | + |
| 119 | +If the `Shareable` field is set to `true` in AllocationResult for ResourceClaim, scheduler will |
| 120 | +allow the same ResourceClaim to be used by up to 32 Pods by automatically updating |
| 121 | +`Claim.Status.ReservedFor` field without consulting the resource driver that allocated resource |
| 122 | +for this ResourceClaim. |
| 123 | + |
| 124 | +### Internal accounting in resource driver |
| 125 | + |
| 126 | +The other way of sharing same resource is by implementing the sharing logic in the resource driver. |
| 127 | +This can be based on, for instance, ResourceClass parameters field that would specify whether the |
| 128 | +resource driver should exclusively allocate the resource to the ResourceClaim, or same resource |
| 129 | +can be allocated to other ResourceClaims. |
| 130 | + |
| 131 | +## Implementing kubelet plugin |
| 132 | + |
| 133 | +It is recommended to use |
| 134 | +[kubelet plugin helper library](/docs/reference/helper-libraries/dra-driver-kubelet-plugin/). |
| 135 | + |
| 136 | +Resource driver's kubelet plugin's main purpose is to ensure the ResourceClaims, that the Pod will |
| 137 | +be using on Node, have all resources ready for Pod to take in use. |
| 138 | + |
| 139 | +### Example {#example-pod} |
| 140 | + |
| 141 | +Suppose a Kubernetes cluster is running a resource driver gpu.resource.example.com with Resource |
| 142 | +Class `example.example.com`. Here is an example of a pod requesting this resource to run a demo |
| 143 | +workload: |
| 144 | + |
| 145 | +```yaml |
| 146 | +# gpu.resource.example.com GpuClaimParameters is an example extension API for parameters |
| 147 | +apiVersion: gpu.resource.example.com/v1alpha1 |
| 148 | +kind: GpuClaimParameters |
| 149 | +metadata: |
| 150 | + name: single-gpu |
| 151 | +spec: |
| 152 | + count: 1 |
| 153 | +--- |
| 154 | +apiVersion: resource.k8s.io/v1alpha2 |
| 155 | +kind: ResourceClaim |
| 156 | +metadata: |
| 157 | + name: gpu-test |
| 158 | +spec: |
| 159 | + resourceClassName: gpu.example.com |
| 160 | + parametersRef: |
| 161 | + apiGroup: gpu.resource.example.com/v1alpha1 |
| 162 | + kind: GpuClaimParameters |
| 163 | + name: single-gpu |
| 164 | +--- |
| 165 | +apiVersion: v1 |
| 166 | +kind: Pod |
| 167 | +metadata: |
| 168 | + namespace: gpu-test4 |
| 169 | + name: pod0 |
| 170 | + labels: |
| 171 | + app: pod |
| 172 | +spec: |
| 173 | + containers: |
| 174 | + - name: container1 |
| 175 | + image: ubuntu:22.04 |
| 176 | + command: ["bash", "-c"] |
| 177 | + args: ["export; sleep 9999"] |
| 178 | + resources: |
| 179 | + claims: |
| 180 | + - name: gpus |
| 181 | + resourceClaims: |
| 182 | + - name: gpus |
| 183 | + source: |
| 184 | + resourceClaimTemplateName: gpu-test |
| 185 | +# This Pod wants to use ResourceClaim gpu-test that needs 1 device of ResourceClass |
| 186 | +# gpu.example.com, handled by the gpu.resource.example.com resource driver. |
| 187 | +# |
| 188 | +# The resource driver allocates the resources required for that ResourceClaim and ensures that these are |
| 189 | +# ready to use, only then the Pod will start. |
| 190 | +``` |
| 191 | + |
| 192 | +## Good practice for resource driver deployment {#resource-driver-deploy-tips} |
| 193 | + |
| 194 | +The recommended way to deploy a resource driver is a Deployment for controller part and a DaemonSet |
| 195 | +for the kubelet plugin part. It is also possible to deploy it as a package for your node's |
| 196 | +operating system, or manually. |
| 197 | + |
| 198 | +The kubelet uses a gRPC interface to interact with a resource driver's kubelet plugin. On the Kubernetes side, |
| 199 | +no special permissions are required for resource drivers. |
| 200 | + |
| 201 | +When you deploy a resource driver, you typically also define at least one ResourceClass using that driver. |
| 202 | + |
| 203 | +## API compatibility |
| 204 | + |
| 205 | +Kubernetes Dynamic Resource Allocation support is in alpha. The API may change before stabilization, |
| 206 | +in incompatible ways. As a project, Kubernetes recommends that resource driver developers: |
| 207 | + |
| 208 | +* Watch for changes in future releases. |
| 209 | +* Support multiple versions of the resource driver API for backward/forward compatibility. |
| 210 | + |
| 211 | +If you enable the `DynamicResourceAllocation` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) and run associated kubelet plugins on nodes |
| 212 | +that need to be upgraded to a Kubernetes release with a newer DRA API version, upgrade your |
| 213 | +resource drivers to support both versions before upgrading these nodes. Taking that approach will |
| 214 | +ensure the continuous functioning of the device allocations during the upgrade. |
| 215 | + |
| 216 | +## DRA resource driver examples {#examples} |
| 217 | + |
| 218 | +{{% thirdparty-content %}} |
| 219 | + |
| 220 | +Here are some examples of resource driver implementations: |
| 221 | + |
| 222 | +* The [example resource driver](https://github.com/kubernetes-sigs/dra-example-driver) |
| 223 | +* The [Intel GPU resource driver](https://github.com/intel/intel-resource-drivers-for-kubernetes) |
| 224 | +* The [NVIDIA GPU resource driver](https://github.com/NVIDIA/k8s-dra-driver) |
| 225 | + |
| 226 | + |
| 227 | +## {{% heading "whatsnext" %}} |
| 228 | + |
| 229 | +* Learn about [creating your own DRA resource driver](https://www.youtube.com/watch?v=_fi9asserLE) |
| 230 | +* Discover the [example DRA resource driver](https://github.com/kubernetes-sigs/dra-example-driver) |
0 commit comments