From 4e242cb227c95eb4e30431e06b7702517a4667ed Mon Sep 17 00:00:00 2001 From: "google-labs-jules[bot]" <161369871+google-labs-jules[bot]@users.noreply.github.com> Date: Mon, 2 Jun 2025 12:51:23 +0000 Subject: [PATCH 1/3] Add device preference use cases to explainer Adds a new subsection "Device Preference Use Cases" to the device-selection-explainer.md document. This subsection details several use cases for device selection preferences, mapping them to the preferences discussed in the W3C WebML WG minutes of 2025-05-08 (https://www.w3.org/2025/05/08-webmachinelearning-minutes.html#2ec0). The use cases cover: - Preferring CPU - Preferring NPU - Preferring GPU - Maximizing performance - Maximizing power efficiency - Minimizing overall system power Future-proof device names ("where JS and Wasm execute", "where WebGL and WebGPU programs execute", "other") are used in the descriptions. --- device-selection-explainer.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index c0fc8dd8..368b56f9 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -51,6 +51,29 @@ Later the need for explicit device selection support was challenged in [[MLConte ## Key use cases and requirements +### Device Preference Use Cases + +A WebNN application may have specific device preferences for model execution. These preferences can be mapped to the following use cases: + +* **Prefer execution on the main CPU**: + * *Preference*: `"prefer CPU"` + * *Description*: The application developer hints that the model should ideally run on the device component primarily responsible for general computation, typically "where JS and Wasm execute". This could be due to the model's characteristics (e.g., heavy control flow, operations best suited for CPU) or to reserve other accelerators for different tasks. +* **Prefer execution on a Neural Processing Unit (NPU)**: + * *Preference*: `"prefer NPU"` + * *Description*: The application developer hints that the model is well-suited for an NPU (a specialized accelerator, referred to as "other" in a future-proof context, distinct from CPU and GPU). This is often the case for models optimized for low power and sustained performance. +* **Prefer execution on a Graphics Processing Unit (GPU)**: + * *Preference*: `"prefer GPU"` + * *Description*: The application developer hints that the model should run on the GPU (the device "where WebGL and WebGPU programs execute"). This is common for models with highly parallelizable operations. +* **Maximize Performance**: + * *Preference*: `"maximum performance"` + * *Description*: The application developer desires the highest possible throughput or lowest latency for the model execution, regardless of power consumption. The underlying system will choose the device or combination of devices (e.g., "where WebGL and WebGPU programs execute", or "other" specialized hardware) that can achieve this. +* **Maximize Power Efficiency**: + * *Preference*: `"maximum efficiency"` + * *Description*: The application developer prioritizes executing the model in the most power-efficient manner, which might involve using an NPU ("other") or a low-power mode of the CPU ("where JS and Wasm execute"). This is crucial for battery-constrained devices or long-running tasks. +* **Minimize Overall System Power**: + * *Preference*: `"minimum overall power"` + * *Description*: The application developer hints that the model execution should contribute as little as possible to the overall system power draw. This is a broader consideration than just the model's own efficiency, potentially influencing scheduling and resource allocation across the system. The implementation may choose any device ("where JS and Wasm execute", "where WebGL and WebGPU programs execute", or "other") that best achieves this goal. + Design decisions may take the following into account: 1. Allow the underlying platform to ultimately choose the compute device. From 86ca102075de119be015a62f969fee460305d766 Mon Sep 17 00:00:00 2001 From: "google-labs-jules[bot]" <161369871+google-labs-jules[bot]@users.noreply.github.com> Date: Mon, 2 Jun 2025 13:01:43 +0000 Subject: [PATCH 2/3] Clarify "other" device category in NPU use case Updates the description for the "Prefer execution on a Neural Processing Unit (NPU)" use case in device-selection-explainer.md. The term "other" as a future-proof device category is now explicitly defined as encompassing specialized ML accelerators that are distinct from CPUs and GPUs. This addresses feedback to improve clarity for you regarding the NPU use case and its relation to the "other" device category. --- device-selection-explainer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index 368b56f9..dfe59bc3 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -60,7 +60,7 @@ A WebNN application may have specific device preferences for model execution. Th * *Description*: The application developer hints that the model should ideally run on the device component primarily responsible for general computation, typically "where JS and Wasm execute". This could be due to the model's characteristics (e.g., heavy control flow, operations best suited for CPU) or to reserve other accelerators for different tasks. * **Prefer execution on a Neural Processing Unit (NPU)**: * *Preference*: `"prefer NPU"` - * *Description*: The application developer hints that the model is well-suited for an NPU (a specialized accelerator, referred to as "other" in a future-proof context, distinct from CPU and GPU). This is often the case for models optimized for low power and sustained performance. + * *Description*: The application developer hints that the model is well-suited for an NPU. NPUs are specialized hardware accelerators, distinct from CPUs (typically "where JS and Wasm execute") and GPUs (typically "where WebGL and WebGPU programs execute"). In a future-proof context, NPUs fall under the category of "other" compute devices, encompassing various current and future specialized ML accelerators. This preference is often chosen for models optimized for low power and sustained performance. * **Prefer execution on a Graphics Processing Unit (GPU)**: * *Preference*: `"prefer GPU"` * *Description*: The application developer hints that the model should run on the GPU (the device "where WebGL and WebGPU programs execute"). This is common for models with highly parallelizable operations. From 0e8f18cc252e9f2e6d1c62c8ce8a0df0fa3cb4fe Mon Sep 17 00:00:00 2001 From: "google-labs-jules[bot]" <161369871+google-labs-jules[bot]@users.noreply.github.com> Date: Mon, 2 Jun 2025 18:55:49 +0000 Subject: [PATCH 3/3] I've addressed the review feedback on device preference use cases. This commit incorporates feedback from PR #855: - Adds a citation to ONNX Runtime's OrtExecutionProviderDevicePolicy as an informing API for the device preference list. - Refines the usage of "other" in use case descriptions for clarity: - Removes quotes from "other" in "Maximize Performance". - Removes redundant ("other") clarification for NPU in "Maximize Power Efficiency". --- device-selection-explainer.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/device-selection-explainer.md b/device-selection-explainer.md index dfe59bc3..6bb3a75c 100644 --- a/device-selection-explainer.md +++ b/device-selection-explainer.md @@ -53,7 +53,7 @@ Later the need for explicit device selection support was challenged in [[MLConte ### Device Preference Use Cases -A WebNN application may have specific device preferences for model execution. These preferences can be mapped to the following use cases: +A WebNN application may have specific device preferences for model execution. The following use cases map to such preferences, informed by existing APIs such as ONNX Runtime's `OrtExecutionProviderDevicePolicy` [1](https://onnxruntime.ai/docs/api/c/group___global.html#gaf26ca954c79d297a31a66187dd1b4e24): * **Prefer execution on the main CPU**: * *Preference*: `"prefer CPU"` @@ -66,10 +66,10 @@ A WebNN application may have specific device preferences for model execution. Th * *Description*: The application developer hints that the model should run on the GPU (the device "where WebGL and WebGPU programs execute"). This is common for models with highly parallelizable operations. * **Maximize Performance**: * *Preference*: `"maximum performance"` - * *Description*: The application developer desires the highest possible throughput or lowest latency for the model execution, regardless of power consumption. The underlying system will choose the device or combination of devices (e.g., "where WebGL and WebGPU programs execute", or "other" specialized hardware) that can achieve this. + * *Description*: The application developer desires the highest possible throughput or lowest latency for the model execution, regardless of power consumption. The underlying system will choose the device or combination of devices (e.g., "where WebGL and WebGPU programs execute", or other specialized hardware) that can achieve this. * **Maximize Power Efficiency**: * *Preference*: `"maximum efficiency"` - * *Description*: The application developer prioritizes executing the model in the most power-efficient manner, which might involve using an NPU ("other") or a low-power mode of the CPU ("where JS and Wasm execute"). This is crucial for battery-constrained devices or long-running tasks. + * *Description*: The application developer prioritizes executing the model in the most power-efficient manner, which might involve using an NPU or a low-power mode of the CPU ("where JS and Wasm execute"). This is crucial for battery-constrained devices or long-running tasks. * **Minimize Overall System Power**: * *Preference*: `"minimum overall power"` * *Description*: The application developer hints that the model execution should contribute as little as possible to the overall system power draw. This is a broader consideration than just the model's own efficiency, potentially influencing scheduling and resource allocation across the system. The implementation may choose any device ("where JS and Wasm execute", "where WebGL and WebGPU programs execute", or "other") that best achieves this goal. @@ -214,3 +214,6 @@ Other use cases were raised as well, in [this comment](https://github.com/webmac > 1. If the user selects to use functionality like background blur, we want to offer the best quality the device can offer. So the product has a small set of candidate models and technologies (WebNN, WebGPU, WASM) that it has to choose between. Accelerated technologies come with allowance for beefier models. > 2. The model/tech choser algorithm needs to be fast, and we need to avoid spending seconds or even hundreds of milliseconds to figure out if a given model should be able to run accelerated. So for example downloading the entirety (could be large things..), compiling & try-running a model seems infeasible. + +## References +[1] ONNX Runtime - OrtExecutionProviderDevicePolicy. (https://onnxruntime.ai/docs/api/c/group___global.html#gaf26ca954c79d297a31a66187dd1b4e24)