Releases · ngxson/llama.cpp

27 Apr 22:32

c0a97b7

b5200

llama-bench : Add `--override-tensors` arg (#12922)

* Add --override-tensors option to llama-bench

* Correct llama-bench --override-tensors to --override-tensor

* llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix.

* Make new llama-bench util functions static to fix Ubuntu CI

* llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)

Assets 26

27 Apr 20:41

github-actions

b5199

ced44be

b5199

llama-chat : fix wrong template in GLM4-0414 (#13140)

* fix wrong template in GLM4-0414

* fix spaces

* no bos token since it is already in the template

* moved the chatgml4 check to higher priority

* restored template for old GLM models

* moved the GLM4 template check in the correct place with correct check

Assets 26

27 Apr 12:07

github-actions

b5198

e291450

b5198

musa: fix build warning (#13129)

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 26

27 Apr 11:37

github-actions

b5197

59e991c

b5197

Fixes Qwen2.5VL segfault during inference with https://github.com/ggm…

Assets 26

27 Apr 08:53

github-actions

b5196

ca2bb89

b5196

clip : Add Qwen2.5VL support (#12402)

* implment vision model architecture, gguf convertor

* handle window attention inputs

* add debug utils

* fix few incorrect tensor memory layout

* move position id remap out of ggml to avoid int32 cuda operations

* cleaning up

* ignore transformers Qwen2_5_xxx type check

* remove not so often use `qwen2vl-cli` debug functions

* remove commented-out code blocks

* fix attn weight scaling after rebase

* add `PROJECTOR_TYPE_QWEN2_5_VL`

* remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`

* replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`

* remove `attn_window_size` from gguf

* fix model conversion

* clean up

* fix merging problem

* add test

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 26

26 Apr 21:44

github-actions

b5195

2d451c8

b5195

common : add common_remote_get_content (#13123)

* common : add common_remote_get_content

* support max size and timeout

* add tests

Assets 26

26 Apr 21:34

github-actions

b5194

4753791

b5194

clip : improve projector naming (#13118)

* clip : improve projector naming

* no more kv has_llava_projector

* rm unused kv

* rm more unused

Assets 26

26 Apr 10:43

github-actions

b5193

e6c4319

b5193

common : add common_remote_get_content

Assets 26

26 Apr 09:02

github-actions

b5192

d5fe4e8

b5192

grammar : handle maxItems == 0 in JSON schema (#13117)

Co-authored-by: Richard Lyons <[email protected]>

Assets 26

25 Apr 18:34

github-actions

b5191

295354e

b5191

llama : fix K-shift with quantized K and BLAS backend (#13113)

Assets 26

Uh oh!

Releases: ngxson/llama.cpp

b5200

Uh oh!

b5199

Uh oh!

b5198

Uh oh!

b5197

Uh oh!

b5196

Uh oh!

b5195

Uh oh!

b5194

Uh oh!

b5193

Uh oh!

b5192

Uh oh!

b5191

Uh oh!