Releases · ggml-org/llama.cpp

26 Sep 21:43

72b24d9

b6602 Latest

Latest

model : make minicpm embedding_scale, residual_scale and logit_scale …

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-09-26T21:43:45Z
llama-b6602-bin-macos-arm64.zip

sha256:ca3a2dd68d8f6a10e38acb547dd47bf62d6da3b6f9fe8814197f9142fec115b4

10.3 MB 2025-09-26T21:44:00Z
llama-b6602-bin-macos-x64.zip

sha256:894556bcdb97200c34ea99037fb5f58e60668b5834a4127d1b18d322f2dd0f23

27.7 MB 2025-09-26T21:44:01Z
llama-b6602-bin-ubuntu-vulkan-x64.zip

sha256:2fc4e7adf493dc42eb1dee847a6775041e3e030bba0027cfec899c780f6a82f9

25.5 MB 2025-09-26T21:44:03Z
llama-b6602-bin-ubuntu-x64.zip

sha256:404a81c641a41f51a5a9ebdd2bbb64f8d7d045f10bf5256c7ee394f558d9e890

12.3 MB 2025-09-26T21:44:04Z
llama-b6602-bin-win-cpu-arm64.zip

sha256:ce4af66681918d8a630a839bd2e6bdb49f491f7e29e0294d63b41b8a91078c00

10.4 MB 2025-09-26T21:44:06Z
llama-b6602-bin-win-cpu-x64.zip

sha256:9ad00c944891db7bab4ac658a8367a22632b7b962a0f222aa8be0296ab5a7b5b

13.5 MB 2025-09-26T21:44:07Z
llama-b6602-bin-win-cuda-12.4-x64.zip

sha256:65d625462d7904973b58ce218ed1880e50a595a2dbb357cc5d7314bd25de4832

146 MB 2025-09-26T21:44:08Z
llama-b6602-bin-win-hip-radeon-x64.zip

sha256:d8588930b6038a1b9119cb2b174c0980e1069a330440af525aece2a633d6c53c

319 MB 2025-09-26T21:44:15Z
llama-b6602-bin-win-opencl-adreno-arm64.zip

sha256:c9130c22c8da23184ac2fc4f68bf0ae9f55b13be1f41ccc76954a745a8f53c33

10.8 MB 2025-09-26T21:44:27Z
Source code (zip)

2025-09-26T21:28:29Z
Source code (tar.gz)

2025-09-26T21:28:29Z

26 Sep 18:31

github-actions

b6601

624207e

b6601

devops: add s390x & ppc64le CI (#15925)

* devops: move s390x and ppc64le ci build

we have access to ubuntu-24.04-s390x and ppc64le images now

Signed-off-by: Aaron Teo <[email protected]>

* devops: disable ppc64le for now since they have compiler errors

Signed-off-by: Aaron Teo <[email protected]>

* devops: stop warnings as errors

Signed-off-by: Aaron Teo <[email protected]>

* devops: switch to non-macro flag

Signed-off-by: Aaron Teo <[email protected]>

* devops: going the llama macro route

Signed-off-by: Aaron Teo <[email protected]>

* devops: add big-endian gguf test models

Signed-off-by: Aaron Teo <[email protected]>

* devops: disable ppc64le to test s390x, check test build

Signed-off-by: Aaron Teo <[email protected]>

* devops: dup .gguf.inp files for big-endian tests

Signed-off-by: Aaron Teo <[email protected]>

* devops: dup .gguf.out files for big-endian too

Signed-off-by: Aaron Teo <[email protected]>

* devops: add python setup and endian byteswap

Signed-off-by: Aaron Teo <[email protected]>

* devops: pooring thing does not have s390x python3

Signed-off-by: Aaron Teo <[email protected]>

* devops: add missing rust compiler for s390x

Signed-off-by: Aaron Teo <[email protected]>

* devops: try rust actions runner

Signed-off-by: Aaron Teo <[email protected]>

* Revert "devops: try rust actions runner"

This reverts commit 3f8db04356033d6c1d7eccc75ca396bc5298250c.

Signed-off-by: Aaron Teo <[email protected]>

* devops: try a different path for rust

Signed-off-by: Aaron Teo <[email protected]>

* devops: dump home directory and user info

Signed-off-by: Aaron Teo <[email protected]>

* devops: install gguf-py only

Signed-off-by: Aaron Teo <[email protected]>

* devops: missed relative path

Signed-off-by: Aaron Teo <[email protected]>

* devops: remove big-endian files since local swapping is working

Signed-off-by: Aaron Teo <[email protected]>

* devops: revert test-tokenizer-0 cmakelists

Signed-off-by: Aaron Teo <[email protected]>

* Fix unicode flags conversion from and to uint16_t

Bitfields are allocated in different order on s390x

Signed-off-by: Aaron Teo <[email protected]>

* Simplify byteswap command

Signed-off-by: Aaron Teo <[email protected]>

* Add byteswapping and git-lfs for test-tokenizers-ggml-vocabs

Signed-off-by: Aaron Teo <[email protected]>

* Fix endianness detection in vocab loader

Signed-off-by: Aaron Teo <[email protected]>

* Disable test-thread-safety on s390x

In this test a model is downloaded,
then immediately loaded to check if more downloads are needed,
and then used for test.

There is no clean way to separate all those steps
 to add byteswapping between them, so just skip this test.

Signed-off-by: Aaron Teo <[email protected]>

* Fix q8_0 test in test-quantize-fns

vec_signed uses unexpected rounding mode.
Explicitly use different rounding function.

Signed-off-by: Aaron Teo <[email protected]>

* devops: add big-endian stories260K

Signed-off-by: Aaron Teo <[email protected]>

* devops: add s390x test-eval-callback

Signed-off-by: Aaron Teo <[email protected]>

* devops: fix test does not exist

Signed-off-by: Aaron Teo <[email protected]>

* devops: fix model not found llama-eval-callback

Signed-off-by: Aaron Teo <[email protected]>

* Fix q3_K dot product error in test-quantize-fns on s390x

Array q8bytes had only 4 elements allocated, but 8 elements accessed.
This lead to write out of bounds and later read of overwritten values out of bounds
and incorrect result.

Signed-off-by: Aaron Teo <[email protected]>

* devops: re-enable ppc64le for testing

Signed-off-by: Aaron Teo <[email protected]>

* devops: activate test-thread-safety for s390x

Signed-off-by: Aaron Teo <[email protected]>

* devops: disable ppc64le tests

for some reason it keeps failing test-thread-safety tests and I do not
    have a machine that is able to replicate the tests.

Signed-off-by: Aaron Teo <[email protected]>

* devops: LLAMA_FATAL_WARNINGS=ON

Signed-off-by: Aaron Teo <[email protected]>

* Correct repository URL for s390x for test-thread-safety model

Signed-off-by: Aaron Teo <[email protected]>

* Fix fs_get_cache_directory

Ensure it works even if both XDG_CACHE_HOME and HOME are unset.
This might happen in containers.

Signed-off-by: Aaron Teo <[email protected]>

* Re-enable CI for ppc64le

Signed-off-by: Aaron Teo <[email protected]>

* Fortify ggml_rope_impl

Only memcpy data from sections argument if it's non-NULL.

Signed-off-by: Aaron Teo <[email protected]>

* Add TODO in struct unicode_cpt_flags to reimplement it in endian-independent way

* Update URL for big-endian model

* Update .github/workflows/build.yml

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update remaining mentions of BE models to ggml-org/models repo

---------

Signed-off-by: Aaron Teo <[email protected]>
Co-authored-by: Aleksei Nikiforov <[email protected]>
Co-authored-by: Aleksei Nikiforov <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

26 Sep 16:05

github-actions

b6598

e0539eb

b6598

webui: switch to hash-based routing (alternative of #16079) (#16157)

* Switched web UI to hash-based routing

* Added hash to missed goto function call

* Removed outdated SPA handling code

* Fixed broken sidebar home link

Assets 15

26 Sep 14:01

github-actions

b6595

cc1cfa2

b6595

mtmd : fix uninitialized variable in bicubic_resize (#16275)

Signed-off-by: Aaron Teo <[email protected]>
Co-authored-by: Aaron Teo <[email protected]>

Assets 15

26 Sep 11:51

github-actions

b6594

54dbc37

b6594

metal : report OOM errors (#16274)

Assets 15

26 Sep 11:45

github-actions

b6593

b995a10

b6593

common : use cpp-httplib as a cURL alternative for downloads (#16185)

* vendor : update httplib

Signed-off-by: Adrien Gallouët <[email protected]>

* common : use cpp-httplib as a cURL alternative for downloads

The existing cURL implementation is intentionally left untouched to
prevent any regressions and to allow for safe, side-by-side testing by
toggling the `LLAMA_CURL` CMake option.

Signed-off-by: Adrien Gallouët <[email protected]>

* ggml : Bump to Windows 10

Signed-off-by: Adrien Gallouët <[email protected]>

---------

Signed-off-by: Adrien Gallouët <[email protected]>

Assets 15

26 Sep 11:03

github-actions

b6591

9b26511

b6591

ggml-cpu: implement MXFP4 SIMD for s390x (#16193)

* ggml-cpu: impl mxfp4 s390x

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: missing s = sumf

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix incorrect kval_mxfp4 type

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: rework mxfp4

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: missing delta calc

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix typo

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: fix typo for vec_splats

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: expand to 2 blocks per loop

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: add unroll to boost perf

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: back to 1 block per loop to test perf

Signed-off-by: Aaron Teo <[email protected]>

* Revert "ggml-cpu: back to 1 block per loop to test perf"

This reverts commit 1fe55724e2dc295701101bf838bdd4a512237492.

Signed-off-by: Aaron Teo <[email protected]>

* ggml-cpu: rm unroll from single block

Signed-off-by: Aaron Teo <[email protected]>

---------

Signed-off-by: Aaron Teo <[email protected]>

Assets 15

26 Sep 01:21

github-actions

b6587

0f7c696

b6587

musa: fix build warnings (#15611)

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 15

25 Sep 18:46

github-actions

b6586

835b2b9

b6586

model : add GroveMoE support (#15510)

* add GroveMoE support

* remove constexpr that fails on certain compilers

* revert crude scalar div implementation, use cast

* build_attn_inp_kv_unified -> build_attn_inp_kv

* fix build_attn

* re-apply ffn_exps regex changes

Assets 15

25 Sep 16:30

github-actions

b6585

b05a9d6

b6585

vendors: update miniaudio version (#16212)

* vendor: update miniaudio.h

Signed-off-by: Aaron Teo <[email protected]>

* vendor: update miniaudio.h

Signed-off-by: Aaron Teo <[email protected]>

---------

Signed-off-by: Aaron Teo <[email protected]>

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6602

Uh oh!

b6601

Uh oh!

b6598

Uh oh!

b6595

Uh oh!

b6594

Uh oh!

b6593

Uh oh!

b6591

Uh oh!

b6587

Uh oh!

b6586

Uh oh!

b6585

Uh oh!