Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b5212
llama-bench: add `-d` depth arg (#13096) * add depth param * update llama-bench README and add depth param * llama-bench: default params for depth arg for faster execution * Update examples/llama-bench/README.md Co-authored-by: Johannes Gäßler <[email protected]> * fix buffer print ub * use user provided args * remove extra whitespaces --------- Co-authored-by: Johannes Gäßler <[email protected]>
b5211
mtmd : fix glm-edge redundant token count (#13139) * mtmd : fix glm-edge redundant token count * fix chat template * temporary disable GLMEdge test chat tmpl
b5210
context : do not clear output buffer on reserve (#13152) Co-authored-by: pockers21 <[email protected]>
b5209
llama : (mrope) allow using normal 1D position for text token (#13138) * llama : (mrope) use normal position for text token * rm n_pos_per_embd from llm_graph_input_attn_temp
b5208
clip : refactor set input for cgraph + fix qwen2.5vl input (#13136) * clip : refactor set input for cgraph * more strict assert * minicpmv : use clip_n_mmproj_embd instead of copying the same code everywhere * split qwen2 and qwen2.5 code blocks * minor style fix
b5205
common : fix noreturn compile warning (#13151) ggml-ci
b5204
llama-chat : fix typo GML --> GLM (#13143)
b5203
musa: fix typo in cc control (#13144) Signed-off-by: Xiaodong Ye <[email protected]>
b5202
CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (#13137)
b5201
arg : fix unused variable (#13142)