Releases: mudler/LocalAI
v3.2.3
What's Changed
Bug fixes 🐛
📖 Documentation and examples
Other Changes
- chore: ⬆️ Update ggml-org/llama.cpp to
c7f3169cd523140a288095f2d79befb20a0b73f4
by @localai-bot in #5913
Full Changelog: v3.2.2...v3.2.3
v3.2.2
What's Changed
Bug fixes 🐛
- fix(backends gallery): trim string when reading cap from file by @mudler in #5909
- fix(vulkan): use correct image suffix by @mudler in #5911
- fix(ci): add nvidia-l4t capability to l4t images by @mudler in #5914
Exciting New Features 🎉
Other Changes
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5912
Full Changelog: v3.2.1...v3.2.2
v3.2.1
What's Changed
Bug fixes 🐛
- fix(install.sh): update to use the new binary naming by @mudler in #5903
- fix(backends gallery): pass-by backend galleries to the model service by @mudler in #5906
Other Changes
- chore: ⬆️ Update ggml-org/llama.cpp to
3f4fc97f1d745f1d5d3c853949503136d419e6de
by @localai-bot in #5900 - chore: ⬆️ Update leejet/stable-diffusion.cpp to
eed97a5e1d054f9c1e7ac01982ae480411d4157e
by @localai-bot in #5901 - chore: ⬆️ Update ggml-org/whisper.cpp to
7de8dd783f7b2eab56bff6bbc5d3369e34f0e77f
by @localai-bot in #5902
Full Changelog: v3.2.0...v3.2.1
v3.2.0
🚀 LocalAI 3.2.0
Welcome to LocalAI 3.2.0! This is a release that refactors our architecture to be more flexible and lightweight.
The core is now separated from all the backends, making LocalAI faster to download, easier to manage, portable, and much more smaller.
TL;DR – What’s New in LocalAI 3.2.0 🎉
- 🧩 Modular Backends: All backends now live outside the main binary in our new Backend Gallery. This means you can update, add, or manage backends independently of LocalAI releases.
- 📉 Leaner Than Ever: The LocalAI binary and container images are drastically smaller, making for faster downloads and a reduced footprint.
- 🤖 Smart Backend Installation: It just works! When you install a model, LocalAI automatically detects your hardware (CPU, NVIDIA, AMD, Intel) and downloads the necessary backend. No more manual configuration!
- 🛠️ Simplified Build Process: The new modular architecture significantly simplifies the build process for contributors and power users.
- ⚡️ Intel GPU Support for Whisper: Transcription with Whisper can now be accelerated on Intel GPUs using SYCL, bringing more hardware options to our users.
- 🗣️ Enhanced Realtime Audio: We've added speech started and stopped events for more interactive applications and OpenAI-compatible support for the input_audio field in the chat API.
- 🧠 Massive Model Expansion: The gallery has been updated with over 50 new models, including the latest from
Qwen3
,Gemma
,Mistral
,Nemotron
, and more!
Note: CI is in the process of building all the backends for this release and will be available soon - if you hit any issue, please try in a few, thanks for understanding!
Note: Some parts of the documentation and the installation scripts (that download the release binaries) have to yet be adapted to the latest changes and/or might not reflect the current state
A New Modular Architecture 🧩
The biggest change in v3.2.0 is the complete separation of inference backends from the core LocalAI binary. Backends like llama.cpp, whisper.cpp, piper, and stablediffusion-ggml are no longer bundled in.
This fundamental shift makes LocalAI:
- Lighter: Significantly smaller binary and container image sizes.
- More Flexible: Update backends anytime from the gallery without waiting for a new LocalAI release.
- Easier to Maintain: A cleaner, more streamlined codebase for faster development.
- Easier to Customize: you can build your own backends and install them in your LocalAI instances.
Smart, Automatic Backend Installation 🤖
To make the new modular system seamless, LocalAI now features automatic backend installation.
When you install a model from the gallery (or a YAML file), LocalAI intelligently detects the required backend and your system's capabilities, then downloads the correct version for you. Whether you're running on a standard CPU, an NVIDIA GPU, an AMD GPU, or an Intel GPU, LocalAI handles it automatically.
For advanced use cases or to override auto-detection, you can use the LOCALAI_FORCE_META_BACKEND_CAPABILITY environment variable. Here are the available options:
default
: Forces CPU-only backend. This is the fallback if no specific hardware is detected.nvidia
: Forces backends compiled with CUDA support for NVIDIA GPUs.amd
: Forces backends compiled with ROCm support for AMD GPUs.intel
: Forces backends compiled with SYCL/oneAPI support for Intel GPUs.
The Backend Gallery & CLI Control 🖼️
You are in full control. You can browse, install, and manage all available backends directly from the WebUI or using the new CLI commands:
# List all available backends in the gallery
local-ai backends list
# Install a specific backend (e.g., llama-cpp)
local-ai backends install llama-cpp
# Uninstall a backend
local-ai backends uninstall llama-cpp
For development, offline or air-gapped environments, you can now also install backends directly from a local OCI tar file:
local-ai backends install "ocifile://<PATH_TO_TAR_FILE>"
Other Key Improvements
- 🗣️ Enhanced Realtime and Audio APIs: Building voice-activated applications is now easier.
- The new speech started and stopped events give you precise control over realtime audio streams.
- We now support the input_audio field in the /v1/chat/completions endpoint for multimodal audio inputs, improving OpenAI compatibility.
- ⚡️ Intel GPU Acceleration for Whisper: Our Whisper backend now supports SYCL, enabling hardware-accelerated transcriptions on Intel GPUs.
- ✅ UI and Bug Fixes: We've squashed several bugs for a smoother experience, including a fix that correctly shows the download status for backend images in the gallery, so you always know what's happening.
- 🧠 Massive Model Gallery Expansion: Our model gallery has never been bigger! We've added over 50 new and updated models, with a focus on powerful new releases like qwen3, devstral-small, and nemotron.
🚨 Important Note for Upgrading
Due to the new modular architecture, if you have existing models installed with a version prior to 3.2.0, they might not have a specific backend assigned.
After upgrading, you may need to install the required backend manually for these models to work. You can do this easily from the WebUI or via the CLI: local-ai backends install <backend_name>
.
The Complete Local Stack for Privacy-First AI
![]() LocalAI |
The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required. |
![]() LocalAGI |
A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI. |
![]() LocalRecall |
A RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Designed to work alongside LocalAI and LocalAGI. Link: https://github.com/mudler/LocalRecall |
Thank you! ❤️
A massive THANK YOU to our incredible community and our sponsors! LocalAI has over 34,100 stars, and LocalAGI has already rocketed past 900+ stars!
As a reminder, LocalAI is real FOSS (Free and Open Source Software) and its sibling projects are community-driven and not backed by VCs or a company. We rely on contributors donating their spare time and our sponsors to provide us the hardware! If you love open-source, privacy-first AI, please consider starring the repos, contributing code, reporting bugs, or spreading the word!
👉 Check out the reborn LocalAGI v2 today: https://github.com/mudler/LocalAGI
Full changelog 👇
👉 Click to expand 👈
What's Changed
Breaking Changes 🛠
- feat: do not bundle llama-cpp anymore by @mudler in #5790
- feat: refactor build process, drop embedded backends by @mudler in #5875
Bug fixes 🐛
- fix(gallery): automatically install model from name by @mudler in #5757
- fix: Diffusers and XPU fixes by @richiejp in #5737
- fix(gallery): correctly show status for downloading OCI images by @mudler in #5774
- fix: explorer page should not have login by @mudler in #5855
- fix: dockerfile typo by @LeonSijiaLu in #5823
- fix(docs): Resolve logo overlap on tablet view by @dedyf5 in #5853
- fix: do not pass by environ to ffmpeg by @mudler in #5871
- fix(p2p): adapt to backend changes, general improvements by @mudler in #5889
Exciting New Features 🎉
v3.1.1
What's Changed
Bug fixes 🐛
- fix(backends gallery): correctly identify gpu vendor by @mudler in #5739
- fix(backends gallery): meta packages do not have URIs by @mudler in #5740
Exciting New Features 🎉
👒 Dependencies
- chore: ⬆️ Update ggml-org/whisper.cpp to
c88ffbf9baeaae8c2cc0a4f496618314bb2ee9e0
by @localai-bot in #5742 - chore: ⬆️ Update ggml-org/llama.cpp to
72babea5dea56c8a8e8420ccf731b12a5cf37854
by @localai-bot in #5743
Other Changes
- fix(ci): better handling of latest images for backends by @mudler in #5735
- fix(ci): enable tag-latest to auto by @mudler in #5738
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5741
Full Changelog: v3.1.0...v3.1.1
v3.1.0
🚀 LocalAI 3.1
🚀 Highlights
Support for Gemma 3n!
Gemma 3n has been released and it's now available in LocalAI (currently only for text generation, install it with:
local-ai run gemma-3n-e2b-it
local-ai run gemma-3n-e4b-it
⚠️ Breaking Changes
Several important changes that reduce image size, simplify the ecosystem, and pave the way for a leaner LocalAI core:
🧰 Container Image Changes
- Sources are no longer bundled in the container images. This significantly reduces image sizes.
- Need to rebuild locally? Just follow the docs to build from scratch. We're working towards migrating all backends to the gallery, slimming down the default image further.
📁 Directory Structure Updated
New default model and backend paths for container images:
- Models:
/models/
(was/build/models
) - Backends:
/backends/
(was/build/backends
)
🏷 Unified Image Tag Naming for master
(development) builds
We've cleaned up and standardized container image tags for clarity and consistency:
gpu-nvidia-cuda11
andgpu-nvidia-cuda12
(previouslycublas-cuda11
,cublas-cuda12
)gpu-intel-f16
andgpu-intel-f32
(previouslysycl-f16
,sycl-f32
)
Meta packages in backend galleries
We’ve introduced meta-packages to the backend gallery!
These packages automatically install the most suitable backend depending on the GPU detected in your system — saving time, reducing errors, and ensuring you get the right setup out of the box. These will be added as soon as the 3.1.0 images are going to be published, stay tuned!
For instance, you will be able to install vllm
just by installing the vllm
backend in the gallery ( no need to select anymore the correct GPU version)
The Complete Local Stack for Privacy-First AI
With LocalAGI rejoining LocalAI alongside LocalRecall, our ecosystem provides a complete, open-source stack for private, secure, and intelligent AI operations:
![]() LocalAI |
The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required. |
![]() LocalAGI |
A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI. |
![]() LocalRecall |
A RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Designed to work alongside LocalAI and LocalAGI. Link: https://github.com/mudler/LocalRecall |
Join the Movement! ❤️
A massive THANK YOU to our incredible community and our sponsors! LocalAI has over 33,500 stars, and LocalAGI has already rocketed past 800+ stars!
As a reminder, LocalAI is real FOSS (Free and Open Source Software) and its sibling projects are community-driven and not backed by VCs or a company. We rely on contributors donating their spare time and our sponsors to provide us the hardware! If you love open-source, privacy-first AI, please consider starring the repos, contributing code, reporting bugs, or spreading the word!
👉 Check out the reborn LocalAGI v2 today: https://github.com/mudler/LocalAGI
Full changelog 👇
👉 Click to expand 👈
What's Changed
Breaking Changes 🛠
- chore(ci):
⚠️ fix latest tag by using docker meta action by @mudler in #5722 - feat:
⚠️ reduce images size and stop bundling sources by @mudler in #5721
Bug fixes 🐛
Exciting New Features 🎉
🧠 Models
- chore(model gallery): add qwen3-the-josiefied-omega-directive-22b-uncensored-abliterated-i1 by @mudler in #5704
- chore(model gallery): add menlo_jan-nano by @mudler in #5705
- chore(model gallery): add qwen3-the-xiaolong-omega-directive-22b-uncensored-abliterated-i1 by @mudler in #5706
- chore(model gallery): add allura-org_q3-8b-kintsugi by @mudler in #5707
- chore(model gallery): add ds-r1-qwen3-8b-arliai-rpr-v4-small-iq-imatrix by @mudler in #5708
- chore(model gallery): add mistralai_mistral-small-3.2-24b-instruct-2506 by @mudler in #5714
- chore(model gallery): add skywork_skywork-swe-32b by @mudler in #5715
- chore(model gallery): add astrosage-70b by @mudler in #5716
- chore(model gallery): add delta-vector_austral-24b-winton by @mudler in #5717
- chore(model gallery): add menlo_jan-nano-128k by @mudler in #5723
- chore(model gallery): add gemma-3n-e2b-it by @mudler in #5730
- chore(model gallery): add gemma-3n-e4b-it by @mudler in #5731
👒 Dependencies
- chore: ⬆️ Update ggml-org/whisper.cpp to
3e65f518ddf840b13b74794158aa95a2c8aa30cc
by @localai-bot in #5691 - chore: ⬆️ Update ggml-org/llama.cpp to
8f71d0f3e86ccbba059350058af8758cafed73e6
by @localai-bot in #5692 - chore: ⬆️ Update ggml-org/llama.cpp to
06cbedfca1587473df9b537f1dd4d6bfa2e3de13
by @localai-bot in #5697 - chore: ⬆️ Update ggml-org/whisper.cpp to
e6c10cf3d5d60dc647eb6cd5e73d3c347149f746
by @localai-bot in #5702 - chore: ⬆️ Update ggml-org/llama.cpp to
aa0ef5c578eef4c2adc7be1282f21bab5f3e8d26
by @localai-bot in #5703 - chore: ⬆️ Update ggml-org/llama.cpp to
238005c2dc67426cf678baa2d54c881701693288
by @localai-bot in #5710 - chore: ⬆️ Update ggml-org/whisper.cpp to
a422176937c5bb20eb58d969995765f90d3c1a9b
by @localai-bot in #5713 - chore: ⬆️ Update ggml-org/llama.cpp to
ce82bd0117bd3598300b3a089d13d401b90279c7
by @localai-bot in #5712 - chore: ⬆️ Update ggml-org/llama.cpp to
73e53dc834c0a2336cd104473af6897197b96277
by @localai-bot in #5719 - chore: ⬆️ Update ggml-org/whisper.cpp to
0083335ba0e9d6becbe0958903b0a27fc2ebaeed
by @localai-bot in #5718 - chore: ⬆️ Update leejet/stable-diffusion.cpp to
10c6501bd05a697e014f1bee3a84e5664290c489
by @localai-bot in #4925 - chore: ⬆️ Update ggml-org/llama.cpp to
2bf9d539dd158345e3a3b096e16474af535265b4
by @localai-bot in #5724 - chore: ⬆️ Update ggml-org/whisper.cpp to
4daf7050ca2bf17f5166f45ac6da651c4e33f293
by @localai-bot in #5725 - Revert "chore: ⬆️ Update leejet/stable-diffusion.cpp to
10c6501bd05a697e014f1bee3a84e5664290c489
" by @mudler in #5727 - chore: ⬆️ Update ggml-org/llama.cpp to
8846aace4934ad29651ea61b8c7e3f6b0556e3d2
by @localai-bot in #5734 - chore: ⬆️ Update ggml-org/whisper.cpp to
32cf4e2aba799aff069011f37ca025401433cf9f
by @localai-bot in #5733
Other Changes
**Full...
v3.0.0
🚀 LocalAI 3.0 – A New Era Begins
Say hello to LocalAI 3.0 — our most ambitious release yet!
We’ve taken huge strides toward making LocalAI not just local, but limitless. Whether you're building LLM-powered agents, experimenting with audio pipelines, or deploying multimodal backends at scale — this release is for you.
Let’s walk you through what’s new. (And yes, there’s a lot to love.)
TL;DR – What’s New in LocalAI 3.0.0 🎉
- 🧩 Backend Gallery: Install/remove backends on the fly, powered by OCI images — fully customizable and API-driven.
- 🎙️ Audio Support: Upload audio, PDFs, or text in the UI — plus new audio understanding models like Qwen Omni.
- 🌐 Realtime API: WebSocket support compatible with OpenAI clients, great for chat apps and agents.
- 🧠 Reasoning UI Boosts: Thinking indicators now show in chat for smart models.
- 📊 Dynamic VRAM Handling: Smarter GPU usage with automatic offloading.
- 🦙 Llama.cpp Upgrades: Now with reranking + multimodal via libmtmd.
- 📦 50+ New Models: Huge model gallery update with fresh LLMs across categories.
- 🐞 Bug Fixes: Streamed runes, template stability, better backend gallery UX.
- ❌ Deprecated: Extras images — replaced by the new backend system.
👉 Dive into the full changelog and docs below to explore more!
🧩 Introducing the Backend Gallery — Plug, Play, Power Up
No more hunting for dependencies or custom hacks.
With the new Backend Gallery, you can now:
- Install & remove backends at runtime or startup via API or directly from the WebUI
- Use custom galleries, just like you do for models
- Enjoy zero-config access to the default LocalAI gallery
Backends are standard OCI images — portable, composable, and totally DIY-friendly. Goodbye to "extras images" — hello to full backend modularity, even with Python-based dependencies.
📖 Explore the Backend Gallery Docs
⚠️ Important: Breaking Changes
From this release we will stop pushing -extra
images containing python backends. You can now use standard images, and you will have only to pick the ones that are suited for your GPU. Additional backends can be installed via the backend gallery.
Here below some examples, note that the CI is still publishing the images so won't be available until jobs are processed, and the installation scripts will be updated right after images are publicly available.
CPU only image:
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
NVIDIA GPU Images:
# CUDA 12
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# CUDA 11
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11
# NVIDIA Jetson (L4T) ARM64
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64
AMD GPU Images (ROCm):
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
Intel GPU Images (oneAPI):
# Intel GPU with FP16 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f16
# Intel GPU with FP32 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f32
Vulkan GPU Images:
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
AIO Images (pre-downloaded models):
# CPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# NVIDIA CUDA 12 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
# NVIDIA CUDA 11 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-11
# Intel GPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel-f16
# AMD GPU version
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas
For more information about the AIO images and pre-downloaded models, see Container Documentation.
🧠 Smarter Reasoning, Smoother Chat
- Realtime WebSocket API: OpenAI-style streaming support via WebSocket is here. Ideal for agents and chat apps.
- "Thinking" Tags: Reasoning models now show a visual "thinking" box during inference in the UI. Intuitive and satisfying.
🧠 Model Power-Up: VRAM Savvy + Multimodal Brains
Dynamic VRAM Estimation: LocalAI now adapts and offloads layers depending on your GPU’s capabilities. Optimal performance, no guesswork.
Llama.cpp upgrades also includes:
- reranking
- Enhanced multimodal support via libmtmd
🧪 New Models!
More than 50 new models joined the gallery, including:
- 🧠 skywork-or1-32b, rivermind-lux-12b, qwen3-embedding-*, llama3-24b-mullein, ultravox-v0_5, and more
- 🧬 Multimodal, reasoning, and domain-specific LLMs for every need
- 📦 Browse the latest additions in the Model Gallery
🐞 Bugfixes & Polish
- Rune streaming is now buttery smooth
- Countless fixes across templates, inputs, CI, and realtime session updates
- Backend gallery UI is more stable and informative
The Complete Local Stack for Privacy-First AI
With LocalAGI rejoining LocalAI alongside LocalRecall, our ecosystem provides a complete, open-source stack for private, secure, and intelligent AI operations:
![]() LocalAI |
The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required. |
![]() LocalAGI |
A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI. |
![]() LocalRecall |
A RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Designed to work alongside LocalAI and LocalAGI. Link: https://github.com/mudler/LocalRecall |
Join the Movement! ❤️
A massive THANK YOU to our incredible community and our sponsors! LocalAI has over 33,300 stars, and LocalAGI has already rocketed past 750+ stars!
As a reminder, LocalAI is real FOSS (Free and Open Source Software) and its sibling projects are community-driven and not backed by VCs or a company. We rely on contributors donating their spare time and our sponsors to provide us the hardware! If you love open-source, privacy-first AI, please consider starring the repos, contributing code, reporting bugs, or spreading the word!
👉 Check out the reborn LocalAGI v2 today: https://github.com/mudler/LocalAGI
LocalAI 3.0.0 is here. What will you build next?
Full changelog 👇
👉 Click to expand 👈
What's Changed
Breaking Changes 🛠
- feat: Add backend gallery by @mudler in #5607
- chore(backends): move
bark-cpp
to the backend gallery by @mudler in #5682
Bug fixes 🐛
- fix(ci): tag latest against cpu-only image by @mudler in #5362
- fix(flux): Set CFG=1 so that prompts are followed by @richiejp in #5378
- fix(template): we do not always have .Name by @mudler in #5508
- fix(input): handle correctly case where we pass by string list as inputs by @mudler in #5521
- fix(streaming): stream complete runes by @mudler in #5539
- fix(install.sh): vulkan docker tag by @halkeye in #5589
- fix(realtime): Use updated model on session update b...
v2.29.0
v2.29.0
I am thrilled to announce the release of LocalAI v2.29.0! This update focuses heavily on refining our container image strategy, making default images leaner and providing clearer options for users needing specific features or hardware acceleration. We've also added support for new models like Qwen3, enhanced existing backends, and introduced experimental endpoints, like video generation!
⚠️ Important: Breaking Changes
This release includes significant changes to container image tagging and contents. Please review carefully:
- Python Dependencies Moved: Images containing extra Python dependencies (like those for
diffusers
) now require the-extras
suffix (e.g.,latest-gpu-nvidia-cuda-12-extras
). Default images are now slimmer and do not include these dependencies. - FFmpeg is Now Standard: All core images now include FFmpeg. The separate
-ffmpeg
tags have been removed. If you previously used an-ffmpeg
tagged image, simply switch to the corresponding base image tag (e.g.,latest-gpu-hipblas-ffmpeg
becomeslatest-gpu-hipblas
).
Here below some examples, note that the CI is still publishing the images so won't be available until jobs are processed, and the installation scripts will be updated right after images are publicly available.
CPU only image:
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
NVIDIA GPU Images:
# CUDA 12.0 with core features
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# CUDA 12.0 with extra Python dependencies
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12-extras
# CUDA 11.7 with core features
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11
# CUDA 11.7 with extra Python dependencies
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11-extras
# NVIDIA Jetson (L4T) ARM64
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64
AMD GPU Images (ROCm):
# ROCm with core features
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
# ROCm with extra Python dependencies
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas-extras
Intel GPU Images (oneAPI):
# Intel GPU with FP16 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f16
# Intel GPU with FP16 support and extra dependencies
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f16-extras
# Intel GPU with FP32 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f32
# Intel GPU with FP32 support and extra dependencies
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f32-extras
Vulkan GPU Images:
# Vulkan with core features
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
AIO Images (pre-downloaded models):
# CPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# NVIDIA CUDA 12 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
# NVIDIA CUDA 11 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-11
# Intel GPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel-f16
# AMD GPU version
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas
For more information about the AIO images and pre-downloaded models, see Container Documentation.
Key Changes in v2.29.0
📦 Container Image Overhaul
-extras
Suffix: Images with additional Python dependencies are now identified by the-extras
suffix.- Default Images: Standard tags (like
latest
,latest-gpu-nvidia-cuda-12
) now provide core LocalAI functionality without the extra Python libraries. - FFmpeg Inclusion: FFmpeg is bundled in all images, simplifying setup for multimedia tasks.
- New
latest-*
Tags: Added specificlatest
tags for various GPU architectures:latest-gpu-hipblas
(AMD ROCm)latest-gpu-intel-f16
(Intel oneAPI FP16)latest-gpu-intel-f32
(Intel oneAPI FP32)latest-gpu-nvidia-cuda-12
(NVIDIA CUDA 12)latest-gpu-vulkan
(Vulkan)
🚀 New Features & Enhancements
- Qwen3 Model Support: Officially integrated support for the Qwen3 model family.
- Experimental Auto GPU Offload: LocalAI can now attempt to automatically detect GPUs and configure optimal layer offloading for
llama.cpp
andCLIP
. - Whisper.cpp GPU Acceleration: Updated whisper.cpp and enabled GPU support via cuBLAS (NVIDIA) and Vulkan. SYCL and Hipblas support are in progress.
- Experimental Video Generation: Introduced a
/video/generations
endpoint. Stay tuned for compatible model backends! - Installer Uninstall Option: The
install.sh
script now includes a--uninstall
flag for easy removal. - Expanded Hipblas Targets: Added support for a wider range of AMD GPU architectures.
gfx803,gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1010,gfx1030,gfx1032,gfx1100,gfx1101,gfx1102
🧹 Backend Updates
- AutoGPTQ Backend Removed: This backend has been dropped due to being discontinued upstream.
- llama.cpp experimental support to automatically detect GPU layers offloading.
The Complete Local Stack for Privacy-First AI
With LocalAGI rejoining LocalAI alongside LocalRecall, our ecosystem provides a complete, open-source stack for private, secure, and intelligent AI operations:
![]() LocalAI |
The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required. |
![]() LocalAGI |
A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI. |
![]() LocalRecall |
A RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Designed to work alongside LocalAI and LocalAGI. Link: https://github.com/mudler/LocalRecall |
Join the Movement! ❤️
A massive THANK YOU to our incredible community! LocalAI has over 32,500 stars, and LocalAGI has already rocketed past 650+ stars!
As a reminder, LocalAI is real FOSS (Free and Open Source Software) and its sibling projects are community-driven and not backed by VCs or a company. We rely on contributors donating their spare time. If you love open-source, privacy-first AI, please consider starring the repos, contributing code, reporting bugs, or spreading the word!
👉 Check out the reborn LocalAGI v2 today: https://github.com/mudler/LocalAGI
Let's continue building the future of AI, together! 🙌
Full changelog 👇
👉 Click to expand 👈
What's Changed
Breaking Changes 🛠
- chore(autogptq): drop archived backend by @mudler in #5214
- chore(ci): build only images with ffmpeg included, simplify tags by @mudler in #5251
- chore(ci): strip 'core' in the image suffix, identify python-based images with 'extras' by @mudler in #5353
Bug fixes 🐛
- fix: bark-cpp: assign FLAG_TTS to bark-cpp backend by @M0Rf30 in #5186
- fix(talk): Talk interface sends content-type headers to chatgpt by @baflo in #5200
- fix: installation script compatibility with fedora 41 and later, fedora headless unclear errors by @Bloodis94 in #5239
- fix(stablediffusion-ggml): Build with DSD CUDA, HIP and Metal ...
v2.28.0
🎉 LocalAI v2.28.0: New Look & The Rebirth of LocalAGI! 🎉
![]() Our fresh new look! |
Big news, everyone! Not only does LocalAI have a brand new logo, but we're also celebrating the full rebirth of LocalAGI, our powerful agent framework, now completely rewritten and ready to revolutionize your local AI workflows!
Rewinding the Clock: The Journey of LocalAI & LocalAGI
Two years ago, LocalAI emerged as a pioneer in the local AI inferencing space, offering an OpenAI-compatible API layer long before it became common. Around the same time, LocalAGI was born as an experiment in AI agent frameworks – you can even find the original announcement here! Originally built in Python, it inspired many with its local-first approach.
See LocalAGI (Original Python Version) in Action!
Searching the internet (interactive mode):
search.mp4
Planning a road trip (batch mode):
planner.mp4
That early experiment has now evolved significantly!
Introducing LocalAGI v2: The Agent Framework Reborn in Go!
We're thrilled to announce that LocalAGI has been rebuilt from the ground up in Golang! It's now a modern, robust AI Agent Orchestration Platform designed to work seamlessly with LocalAI. Huge thanks to the community, especially @richiejp, for jumping in and helping create a fantastic new WebUI!
LocalAGI leverages all the features that make LocalAI great for agentic tasks. During the refactor, we even spun out the memory layer into its own component: LocalRecall, a standalone REST API for persistent agent memory.
🚀 What Makes LocalAGI v2 Shine?
- 🎯 OpenAI Responses API Compatible: Integrates perfectly with LocalAI, acting as a drop-in replacement for cloud APIs, keeping your interactions local and secure.
- 🤖 Next-Gen AI Agent Orchestration: Easily configure, deploy, and manage teams of intelligent AI agents through an intuitive no-code web interface.
- 🛡️ Privacy-First by Design: Everything runs locally. Your data never leaves your hardware.
- 📡 Instant Integrations: Comes with built-in connectors for Slack, Telegram, Discord, GitHub Issues, IRC, and more.
- ⚡ Extensible and Multimodal: Supports multiple models (text, vision) and custom actions, perfectly complementing your LocalAI setup.
✨ Check out the new LocalAGI WebUI:
What's New Specifically in LocalAI v2.28.0?
Beyond the rebranding and the major LocalAGI news, this LocalAI release also brings its own set of improvements:
- 🖼️ SYCL Support: Added SYCL support for
stablediffusion.cpp
. - ✨ WebUI Enhancements: Continued improvements to the user interface.
- 🧠 Diffusers Updated: Core diffusers library has been updated.
- 💡 Lumina Model Support: Now supports the Lumina model family for generating stunning images!
- 🐛 Bug Fixes: Resolved issues related to setting
LOCALAI_SINGLE_ACTIVE_BACKEND
totrue
.
The Complete Local Stack for Privacy-First AI
With LocalAGI rejoining LocalAI alongside LocalRecall, our ecosystem provides a complete, open-source stack for private, secure, and intelligent AI operations:
![]() LocalAI |
The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required. |
![]() LocalAGI |
A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI. |
![]() LocalRecall |
A RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Designed to work alongside LocalAI and LocalAGI. Link: https://github.com/mudler/LocalRecall |
Join the Movement! ❤️
A massive THANK YOU to our incredible community! LocalAI has over 31,800 stars, and LocalAGI has already rocketed past 450+ stars!
As a reminder, LocalAI is real FOSS (Free and Open Source Software) and its sibling projects are community-driven and not backed by VCs or a company. We rely on contributors donating their spare time. If you love open-source, privacy-first AI, please consider starring the repos, contributing code, reporting bugs, or spreading the word!
👉 Check out the reborn LocalAGI v2 today: https://github.com/mudler/LocalAGI
Let's continue building the future of AI, together! 🙌
Full changelog 👇
👉 Click to expand 👈
What's Changed
Bug fixes 🐛
Exciting New Features 🎉
🧠 Models
- chore(model gallery): add all-hands_openhands-lm-32b-v0.1 by @mudler in #5111
- chore(model gallery): add burtenshaw_gemmacoder3-12b by @mudler in #5112
- chore(model gallery): add all-hands_openhands-lm-7b-v0.1 by @mudler in #5113
- chore(model gallery): add all-hands_openhands-lm-1.5b-v0.1 by @mudler in #5114
- chore(model gallery): add gemma-3-12b-it-qat by @mudler in #5117
- chore(model gallery): add gemma-3-4b-it-qat by @mudler in #5118
- chore(model gallery): add tesslate_synthia-s1-27b by @mudler in #5119
- chore(model gallery): add katanemo_arch-function-chat-7b by @mudler in #5120
- chore(model gallery): add katanemo_arch-function-chat-1.5b by @mudler in #5121
- chore(model gallery): add katanemo_arch-function-chat-3b by @mudler in #5122
- chore(model gallery): add gemma-3-27b-it-qat by @mudler in #5124
- chore(model gallery): add open-thoughts_openthinker2-32b by @mudler in #5128
- chore(model gallery): add open-thoughts_openthinker2-7b by @mudler in #5129
- chore(model gallery): add arliai_qwq-32b-arliai-rpr-v by @mudler in #5137
- chore(model gallery): add watt-ai_watt-tool-70b by @mudler in #5138
- chore(model gallery): add eurydice-24b-v2-i1 by @mudler in #5139
- chore(model gallery): add mensa-beta-14b-instruct-i1 by @mudler in #5140
- chore(model gallery): add meta-llama_llama-4-scout-17b-16e-instruct by @mudler in #5141
- fix(gemma): improve prompt for tool calls by @mudler in #5142
- chore(model gallery): add cogito-v1-preview-qwen-14b by @mudler in #5145
- chore(model gallery): add deepcogito_cogito-v1-preview-llama-8b by @mudler in #5147
- chore(model gallery): add...
v2.27.0
🚀 LocalAI v2.27.0
Welcome to another exciting release of LocalAI v2.27.0! We've been working hard to bring you a fresh WebUI experience and a host of improvements under the hood. Get ready to explore new updates!
🔥 AIO Images Updates
Check out the updated models we're now shipping with our All-in-One images:
CPU All-in-One:
- Text-to-Text:
llama3.1
- Embeddings:
granite-embeddings
- Vision:
minicpm
GPU All-in-One:
- Text-to-Text:
localai-functioncall-qwen2.5-7b-v0.5
(our tiniest flagship model!) - Embeddings:
granite-embeddings
- Vision:
minicpm
💻 WebUI Overhaul!
We've given the WebUI a brand-new look and feel. Have a look at the stunning new interface:
Talk Interface | Generate Audio |
---|---|
![]() |
![]() |
Models Overview | Generate Images |
---|---|
![]() |
![]() |
Chat Interface | API Overview |
---|---|
![]() |
![]() |
Login | Swarm |
---|---|
![]() |
![]() |
How to Use
To get started with LocalAI, you can use our container images. Here’s how to run them with Docker:
# CPU only image:
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-cpu
# Nvidia GPU:
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# CPU and GPU image (bigger size):
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
# AIO images (pre-downloads a set of models ready for use, see https://localai.io/basics/container/)
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
Check out our Documentation for more information.
Key Highlights:
- Complete WebUI Redesign: A fresh, modern interface with enhanced navigation and visuals.
- Model Gallery Improvements: Easier exploration with improved pagination and filtering.
- AIO Image Updates: Smoother deployments with updated models.
- Stability Fixes: Critical bug fixes in model initialization, embeddings handling, and GPU offloading.
What’s New 🎉
- Chat Interface Enhancements: Cleaner layout, model-specific UI tweaks, and custom reply prefixes.
- Smart Model Detection: Automatically links to relevant model documentation based on use.
- Performance Tweaks: GGUF models now auto-detect context size, and Llama.cpp handles batch embeddings and SIGTERM gracefully.
- VLLM Config Boost: Added options to disable logging, set dtype, and enforce per-prompt media limits.
- New model architecture supported: Gemma 3, Mistral, Deepseek
Bug Fixes 🐛
- Resolved model icon display inconsistencies.
- Ensured proper handling of generated artifacts without API key restrictions.
- Optimized CLIP offloading and Llama.cpp process termination.
Stay Tuned!
We have some incredibly exciting features and updates lined up for you. While we can't reveal everything just yet. Keep an eye out for our upcoming announcements – you won't want to miss them!
Do you like the new webui? let us know in the Github discussions!
Enjoy 🚀
Full changelog 👇
👉 Click to expand 👈
What's Changed
Bug fixes 🐛
- fix: change initialization order of llama-cpp-avx512 to go before avx2 variant by @bhulsken in #4837
- fix(coqui): pin transformers by @mudler in #4875
- fix(ui): not all models have an Icon by @mudler in #4913
- fix(models): unify usecases identifications by @mudler in #4914
- fix(llama.cpp): correctly handle embeddings in batches by @mudler in #4957
- fix(routes): do not gate generated artifacts via key by @mudler in #4971
- fix(clip): do not imply GPU offload by default by @mudler in #5010
- fix(llama.cpp): properly handle sigterm by @mudler in #5099
Exciting New Features 🎉
- feat(ui): detect model usage and display link by @mudler in #4864
- feat(vllm): Additional vLLM config options (Disable logging, dtype, and Per-Prompt media limits) by @TheDropZone in #4855
- feat(ui): show only text models in the chat interface by @mudler in #4869
- feat(ui): do also filter tts and image models by @mudler in #4871
- feat(ui): paginate model gallery by @mudler in #4886
- feat(ui): small improvements to chat interface by @mudler in #4907
- feat(ui): improve chat interface by @mudler in #4910
- feat(ui): improvements to index and models page by @mudler in #4918
- feat: allow to specify a reply prefix by @mudler in #4931
- feat(ui): complete design overhaul by @mudler in #4942
- feat(ui): remove api key handling and small ui adjustments by @mudler in #4948
- feat(aio): update AIO image defaults by @mudler in #5002
- feat(gguf): guess default context size from file by @mudler in #5089
🧠 Models
- chore(model gallery): add ozone-ai_0x-lite by @mudler in #4835
- chore: update Image generation docs and examples by @mudler in #4841
- chore(model gallery): add kubeguru-llama3.2-3b-v0.1 by @mudler in #4858
- chore(model gallery): add allenai_llama-3.1-tulu-3.1-8b by @mudler in #4859
- chore(model gallery): add nbeerbower_dumpling-qwen2.5-14b by @mudler in #4860
- chore(model gallery): add nbeerbower_dumpling-qwen2.5-32b-v2 by @mudler in #4861
- chore(model gallery): add nbeerbower_dumpling-qwen2.5-72b by @mudler in #4862
- chore(model gallery): add pygmalionai_pygmalion-3-12b by @mudler in #4866
- chore(model gallery): add open-r1_openr1-qwen-7b by @mudler in #4867
- chore(model gallery): add sentientagi_dobby-unhinged-llama-3.3-70b by @mudler in #4868
- chore(model gallery): add internlm_oreal-32b by @mudler in #4872
- chore(model gallery): add internlm_oreal-deepseek-r1-distill-qwen-7b by @mudler in #4873
- chore(model gallery): add internlm_oreal-7b by @mudler in #4874
- chore(model gallery): add smirki_uigen-t1.1-qwen-14b by @mudler in #4877
- chore(model gallery): add smirki_uigen-t1.1-qwen-7b by @mudler in #4878
- chore(model gallery): add l3.1-8b-rp-ink by @mudler in #4879
- chore(model gallery): add pocketdoc_dans-personalityengine-v1.2.0-24b by @mudler in #4880
- chore(model gallery): add rombo-org_rombo-llm-v3.0-qwen-72b by @mudler in #4882
- chore(model gallery): add ozone-ai_reverb-7b by @mudler in #4883
- chore(model gallery): add arcee-ai_arcee-maestro-7b-preview by @mudler in #4884
- chore(model gallery): add steelskull_l3.3-mokume-gane-r1-70b by @mudler in #4885
- chore(model gallery): add steelskull_l3.3-cu-mai-r1-70b by @mudler in #4892
- chore(model gallery): add steelskull_l3.3-san-mai-r1-70b by @mudler in https://git...