Replies: 1 comment 3 replies
-
|
thanks for your interest in this project.
pls refer to ggmlqnn_compute_mul_mat in this project: https://github.com/zhouwg/ggml-hexagon/blob/self-build/ggml/src/ggml-hexagon/ggml-hexagon.cpp#L4437-L4639 you will thoroughly understand how to using C API provided in QNN SDK after you fully understand this function. one more thing, you will understand the core principle of QNN SDK, especially the key-point principle of Qualcomm's highly-designed/very complicated AI-Hub tech stack.
the following source code is a good and simple example to help you to understand how to directly invoke the computing power of the Hexagon NPU: |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I'm currently exploring general-purpose computation on the NPU, such as matrix multiplication. I tried generating a minimal ONNX model containing only a
MatMuloperation, then converting it into a QNN model to run on a Snapdragon 8 Elite. However, the computed results are too imprecise to be acceptable. Additionally, using this model-based approach for such basic operations feels counterintuitive and inefficient.I'm looking for a way to perform matrix multiplication directly using the QNN APIs or just Hexagon SDK, bypassing the ONNX model path. Unfortunately, the QNN documentation only describes how to run complete models and doesn’t seem to support standalone function-level invocations. Many search results on this topic pointed me to your repository, where I saw that you’ve done extensive work on Snapdragon NPU—very impressive and inspiring.
So far, I’ve found the official QNN documentation for the
MatMulop here: QNN MatMul op definition. But from what I understand, using this op still requires building a model, and doesn’t allow direct usage in a standalone C/C++ program.Also, as mentioned in another discussion, it seems Qualcomm doesn’t publicly provide HMX, but HVX can potentially be used to implement matrix multiplication indirectly. I noticed that your repo includes such an implementation:
ggml-hexagon/mulmat.c. This appears to be built on the Hexagon SDK, not QNN SDK (just in my understand), and seems to implement matrix multiplication at the DSP layer.However, I didn't see any
.idlfiles in your codebase, which is used in the Hexagon SDK examplecalculator_c++_apk, though that example itself is a bit unclear to me—it doesn’t seem to involve HVX or explain how to utilize HVX extensions. The official documentation also lacks clarity on this topic.Lastly, regarding your code in
ggml/src/ggml-hexagon: when running it, is it still necessary to link against libraries likelibQnnHtp.so,libQnnHtpPrepare.so, orlibQnnHtpV79Stub.so? In other words, if compiled independently, is it possible to expose NPU-based matrix compute as a standalone C++ library—just like a typical external computation library?I’d greatly appreciate any advice you could share on how to directly invoke the computing power of the NPU.
Beta Was this translation helpful? Give feedback.
All reactions