Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b5200
llama-bench : Add `--override-tensors` arg (#12922) * Add --override-tensors option to llama-bench * Correct llama-bench --override-tensors to --override-tensor * llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix. * Make new llama-bench util functions static to fix Ubuntu CI * llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)
b5199
llama-chat : fix wrong template in GLM4-0414 (#13140) * fix wrong template in GLM4-0414 * fix spaces * no bos token since it is already in the template * moved the chatgml4 check to higher priority * restored template for old GLM models * moved the GLM4 template check in the correct place with correct check
b5198
musa: fix build warning (#13129) Signed-off-by: Xiaodong Ye <[email protected]>
b5197
Fixes Qwen2.5VL segfault during inference with https://github.com/ggm…
b5196
clip : Add Qwen2.5VL support (#12402) * implment vision model architecture, gguf convertor * handle window attention inputs * add debug utils * fix few incorrect tensor memory layout * move position id remap out of ggml to avoid int32 cuda operations * cleaning up * ignore transformers Qwen2_5_xxx type check * remove not so often use `qwen2vl-cli` debug functions * remove commented-out code blocks * fix attn weight scaling after rebase * add `PROJECTOR_TYPE_QWEN2_5_VL` * remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM` * replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN` * remove `attn_window_size` from gguf * fix model conversion * clean up * fix merging problem * add test --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
b5195
common : add common_remote_get_content (#13123) * common : add common_remote_get_content * support max size and timeout * add tests
b5194
clip : improve projector naming (#13118) * clip : improve projector naming * no more kv has_llava_projector * rm unused kv * rm more unused
b5193
common : add common_remote_get_content
b5192
grammar : handle maxItems == 0 in JSON schema (#13117) Co-authored-by: Richard Lyons <[email protected]>
b5191
llama : fix K-shift with quantized K and BLAS backend (#13113)