It makes use of Qualcomm AI Hub models, converted to ONNX. The input format is I420; to convert to it WebRTC pass-through is used. Image converter is written in AssemblyScript with SIMD intrinsics and compiled to inlined WASM, to be published separately.
It wasn't possible to make use of transformers.js
for the purpose, so it is based directly on onnxruntime.
Inference engine is configured to use WebGPU, since WebGL doesn't work at all, and CPU-based suffers from poor performance.
Makes use of the compiled ai-image-converter to convert I420 video frames to input tensors.
Qualcomm's models have their own licence.
This repo (except models) is MIT.