WebNN should support NPU and QDQ operations

Related to issue #128 and #302, we've been talking about supporting the NPU for the last few years. Now that more commercial NPU platforms become available (e.g. with the more recent arrival of Intel Core Ultra NPU), it is time to formally define NPU support in the WebNN spec. There are two key elements of this specification:
1. *An ability to specify a device type for the NPU*. Unlike more general-purpose devices such as the GPU and CPU, an NPU supports a limited finite set of operations without programmability support. To an extent needed to keep model execution stable and more predictable, the notion of a *fallback device* is needed to support NPU acceleration during model inference. 
2. *A minimum set of operators required to support quantized models*. Because most NPU utilizes a much simpler and less power-hungry low-bit integer arithmetic units, models targeting the NPU almost always need to be quantized first. The bare minimal support here in terms of operators are just two -- the `quantizeLinear` and `dequantizeLinear` operators. These two will be enough to handle quantized models by pairing them up at the right places in the model graph, the so-called tensor-oriented [*QDQ*](https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#onnx-quantization-representation-format) format used in ONNX. Additionally, two more prominent quantized operators, one for convolution, and another for matmul will allow more quantized models not already expressed in the QDQ format to function i.e. `conv2dInt` and `matmulInt`. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WebNN should support NPU and QDQ operations #623

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

WebNN should support NPU and QDQ operations #623

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions