Skip to content

Releases: ngxson/llama.cpp

b5200

27 Apr 22:32
c0a97b7

Choose a tag to compare

llama-bench : Add `--override-tensors` arg (#12922)

* Add --override-tensors option to llama-bench

* Correct llama-bench --override-tensors to --override-tensor

* llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix.

* Make new llama-bench util functions static to fix Ubuntu CI

* llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)

b5199

27 Apr 20:41
ced44be

Choose a tag to compare

llama-chat : fix wrong template in GLM4-0414 (#13140)

* fix wrong template in GLM4-0414

* fix spaces

* no bos token since it is already in the template

* moved the chatgml4 check to higher priority

* restored template for old GLM models

* moved the GLM4 template check in the correct place with correct check

b5198

27 Apr 12:07
e291450

Choose a tag to compare

musa: fix build warning (#13129)

Signed-off-by: Xiaodong Ye <[email protected]>

b5197

27 Apr 11:37
59e991c

Choose a tag to compare

Fixes Qwen2.5VL segfault during inference with https://github.com/ggm…

b5196

27 Apr 08:53
ca2bb89

Choose a tag to compare

clip : Add Qwen2.5VL support (#12402)

* implment vision model architecture, gguf convertor

* handle window attention inputs

* add debug utils

* fix few incorrect tensor memory layout

* move position id remap out of ggml to avoid int32 cuda operations

* cleaning up

* ignore transformers Qwen2_5_xxx type check

* remove not so often use `qwen2vl-cli` debug functions

* remove commented-out code blocks

* fix attn weight scaling after rebase

* add `PROJECTOR_TYPE_QWEN2_5_VL`

* remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`

* replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`

* remove `attn_window_size` from gguf

* fix model conversion

* clean up

* fix merging problem

* add test

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

b5195

26 Apr 21:44
2d451c8

Choose a tag to compare

common : add common_remote_get_content (#13123)

* common : add common_remote_get_content

* support max size and timeout

* add tests

b5194

26 Apr 21:34
4753791

Choose a tag to compare

clip : improve projector naming (#13118)

* clip : improve projector naming

* no more kv has_llava_projector

* rm unused kv

* rm more unused

b5193

26 Apr 10:43

Choose a tag to compare

common : add common_remote_get_content

b5192

26 Apr 09:02
d5fe4e8

Choose a tag to compare

grammar : handle maxItems == 0 in JSON schema (#13117)

Co-authored-by: Richard Lyons <[email protected]>

b5191

25 Apr 18:34
295354e

Choose a tag to compare

llama : fix K-shift with quantized K and BLAS backend (#13113)