Skip to content

Releases: ggml-org/llama.cpp

b5299

07 May 09:16
6c7fd67
Compare
Choose a tag to compare
llama : support tie embedding for chatglm models (#13328)

b5298

06 May 23:10
141a908
Compare
Choose a tag to compare
CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (#13135)

b5297

06 May 22:26
32916a4
Compare
Choose a tag to compare
clip : refactor graph builder (#13321)

* mtmd : refactor graph builder

* fix qwen2vl

* clean up siglip cgraph

* pixtral migrated

* move minicpmv to a dedicated build function

* move max_feature_layer to build_llava

* use build_attn for minicpm resampler

* fix windows build

* add comment for batch_size

* also support tinygemma3 test model

* qwen2vl does not use RMS norm

* fix qwen2vl norm (2)

b5296

06 May 22:16
ffc7272
Compare
Choose a tag to compare
sampling : make top_n_sigma no-op at <=0 or a single candidate (#13345)

b5295

06 May 19:17
91a86a6
Compare
Choose a tag to compare
sampling : don't consider -infinity values in top_n_sigma (#13344)

b5293

06 May 16:31
1e333d5
Compare
Choose a tag to compare
SYCL: Disable reorder optimize by default and stop setting tensor ext…

b5292

06 May 14:29
2f54e34
Compare
Choose a tag to compare
llama : fix build_ffn without gate (#13336)

* llama : fix build_ffn without gate

* fix build on windows

* Revert "fix build on windows"

This reverts commit fc420d3c7eef3481d3d2f313fef2757cb33a7c56.

b5289

06 May 07:28
15a28ec
Compare
Choose a tag to compare
CUDA: fix --split-mode row for MMQ (#13323)

b5287

05 May 21:28
9070365
Compare
Choose a tag to compare
CUDA: fix logic for clearing padding with -ngl 0 (#13320)

b5286

05 May 20:56
233461f
Compare
Choose a tag to compare
sampling : Integrate Top-nσ into main sampling chain (and add it to t…