-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Is your feature request related to a problem? Please describe.
I am trying to run LocalAI more efficiently on a Orange Pi 5B, which have an NPUm and Arm Mali-G610 GPU.
More specifically:
CPU | • 8-core 64-bit processor• Big.Little Architecture: 4 * Cortex-A76 and 4 * Cortex-A55, Big core cluster is 2.4GHz, and Little core cluster is 1.8GHz frequency.
GPU | • Arm Mali-G610 MP4• Compatible with OpenGL ES1.1/2.0/3.2, OpenCL 2.2 and Vulkan 1.2• 3D graphics engine and 2D graphics engine
NPU | Built-in AI accelerator NPU with up to 6 TOPS, supports INT4/INT8/INT16 mixed operation
(reference )
There are different frameworks that supports taking advantage of the NPU / GPU on that board to accelerate models inference, and I was wondering if it is possible to integrate that into LocalAI as well?
- https://llm.mlc.ai/docs/install/gpu.html#orange-pi-5-rk3588-based-sbc
- https://onnxruntime.ai/docs/build/eps.html#rknpu
- https://medium.com/@benoit.clouet/running-llama3-on-the-gpu-of-a-rk1-turing-pi-6dddb9e14521
- https://blog.mlc.ai/2024/04/20/GPU-Accelerated-LLM-on-Orange-Pi
There is also an interesting discussion on this topic on the Ollama repo: ggml-org/llama.cpp#722
Rockchip are pretty cheap, and can be a great for edge AI stuff. Would be great to have support for that !!