feat(llama.cpp): upgrade and use libmtmd #5379

mudler · 2025-05-16T16:30:05Z

Description

Latest llama.cpp release bring various API changes among super exciting features (thanks 🫶 @ngxson and @ggerganov!) , this called a completely rewrite of our grpc server to avoid drift with upstream.

The new implementation now is (almost) on par with what's on llama.cpp master, but now keeping things in sync its much easier.

Notes for Reviewers

In a next round, would be cool to upstream some architectural changes (like splitting the main server) from the http server, reducing even more maintenance on LocalAI's side.

#5368

Supersedes #5365

Signed commits

Yes, I signed my commits.

Signed-off-by: Ettore Di Giacinto <[email protected]>

…t/360a9c98e13d35f322b4c5b1309aab0cc90ed82b#diff-a18a8e64e12a01167d8e98fc[…]cccf0d4eed09d76d879L2998-L3207 Signed-off-by: Ettore Di Giacinto <[email protected]>

Signed-off-by: Ettore Di Giacinto <[email protected]>

netlify · 2025-05-16T16:30:11Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`df3ec72`
🔍 Latest deploy log	https://app.netlify.com/projects/localai/deploys/68286306ed8e220008c39771
😎 Deploy Preview	https://deploy-preview-5379--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Signed-off-by: Ettore Di Giacinto <[email protected]>

This is in order to improve maintainability and re-usability by downstream projects such as LocalAI (see mudler/LocalAI#5379 for context). The context server is a struct that can be re-used quite heavily by other communication protocols. For instance, LocalAI uses the context server on top of gRPC rather than having a REST API. This change improves overall re-usability by isolating the REST API to its own file so the context server can be imported easily. Signed-off-by: mudler <[email protected]>

mudler added 16 commits May 14, 2025 20:11

WIP

7437d0c

wip

cd4c0b8

wip

453eb7d

Make it compile

6381f9b

Update json.hpp

b1e0d0a

this shouldn't be private for now

1dc76be

Add logs

b087a44

Reset auto detected template

73cb2f8

Signed-off-by: Ettore Di Giacinto <[email protected]>

Re-enable grammars

141ceaf

This seems to be broken - https://github.com/ggml-org/llama.cpp/commi…

31b280f

…t/360a9c98e13d35f322b4c5b1309aab0cc90ed82b#diff-a18a8e64e12a01167d8e98fc[…]cccf0d4eed09d76d879L2998-L3207 Signed-off-by: Ettore Di Giacinto <[email protected]>

Placeholder

632b0b1

Simplify image loading

8d16602

use completion type

6b38c32

disable streaming

ef96c4f

Signed-off-by: Ettore Di Giacinto <[email protected]>

correctly return timings

b81896a

Signed-off-by: Ettore Di Giacinto <[email protected]>

Remove some debug logging

b354837

github-actions bot added the dependencies label May 16, 2025

Adapt tests

c15e91a

Signed-off-by: Ettore Di Giacinto <[email protected]>

mudler mentioned this pull request May 16, 2025

SmolVLM Support #5368

Closed

mudler added 10 commits May 16, 2025 20:08

Keep header

1f536c5

embedding: do not use oai type

3d397d8

Signed-off-by: Ettore Di Giacinto <[email protected]>

Sync from server.cpp

6c751d9

Use utils and json directly from llama.cpp

d2a5905

Signed-off-by: Ettore Di Giacinto <[email protected]>

Sync with upstream

b9cf7c3

Signed-off-by: Ettore Di Giacinto <[email protected]>

fix: copy json.hpp from the correct location

67786c9

Signed-off-by: Ettore Di Giacinto <[email protected]>

fix: add httplib

f30a790

sync llama.cpp

bad6d96

Signed-off-by: Ettore Di Giacinto <[email protected]>

Embeddiongs: set OAICOMPAT_TYPE_EMBEDDING

6634185

Signed-off-by: Ettore Di Giacinto <[email protected]>

feat: sync with server.cpp by including it

dd381f8

Signed-off-by: Ettore Di Giacinto <[email protected]>

mudler force-pushed the libmtmd-grpc-driftfree branch from 4488367 to dd381f8 Compare May 17, 2025 07:49

make it darwin-compatible

df3ec72

Signed-off-by: Ettore Di Giacinto <[email protected]>

mudler merged commit 6d5bde8 into master May 17, 2025
28 checks passed

mudler deleted the libmtmd-grpc-driftfree branch May 17, 2025 14:02

mudler mentioned this pull request Jun 3, 2025

chore(server): split context-server to its own file ggml-org/llama.cpp#13987

Open

mudler added enhancement New feature or request and removed dependencies labels Jun 15, 2025

BrewTestBot mentioned this pull request Jun 19, 2025

go-rice 1.0.3 (new formula) localai 3.0.0 Homebrew/homebrew-core#227464

Merged

1 task

mudler mentioned this pull request Sep 12, 2025

llama.cpp: Embeddings always identical #6257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(llama.cpp): upgrade and use libmtmd #5379

feat(llama.cpp): upgrade and use libmtmd #5379

Uh oh!

mudler commented May 16, 2025 •

edited

Loading

Uh oh!

netlify bot commented May 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

feat(llama.cpp): upgrade and use libmtmd #5379

feat(llama.cpp): upgrade and use libmtmd #5379

Uh oh!

Conversation

mudler commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for localai ready!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mudler commented May 16, 2025 •

edited

Loading

netlify bot commented May 16, 2025 •

edited

Loading