Skip to content

Releases: ngxson/llama.cpp

b5212

28 Apr 15:36
1831f53

Choose a tag to compare

llama-bench: add `-d` depth arg (#13096)

* add depth param

* update llama-bench README and add depth param

* llama-bench: default params for depth arg for faster execution

* Update examples/llama-bench/README.md

Co-authored-by: Johannes Gäßler <[email protected]>

* fix buffer print ub

* use user provided args

* remove extra whitespaces

---------

Co-authored-by: Johannes Gäßler <[email protected]>

b5211

28 Apr 15:00
4e87962

Choose a tag to compare

mtmd : fix glm-edge redundant token count (#13139)

* mtmd : fix glm-edge redundant token count

* fix chat template

* temporary disable GLMEdge test chat tmpl

b5210

28 Apr 14:35
fb0471d

Choose a tag to compare

context : do not clear output buffer on reserve (#13152)

Co-authored-by: pockers21 <[email protected]>

b5209

28 Apr 13:07
d2b2031

Choose a tag to compare

llama : (mrope) allow using normal 1D position for text token (#13138)

* llama : (mrope) use normal position for text token

* rm n_pos_per_embd from llm_graph_input_attn_temp

b5208

28 Apr 11:20
5fa9e63

Choose a tag to compare

clip : refactor set input for cgraph + fix qwen2.5vl input (#13136)

* clip : refactor set input for cgraph

* more strict assert

* minicpmv : use clip_n_mmproj_embd instead of copying the same code everywhere

* split qwen2 and qwen2.5 code blocks

* minor style fix

b5205

28 Apr 09:55
43f2b07

Choose a tag to compare

common : fix noreturn compile warning (#13151)

ggml-ci

b5204

28 Apr 09:12
e5d6c25

Choose a tag to compare

llama-chat : fix typo GML --> GLM (#13143)

b5203

28 Apr 08:47
f0dd6a1

Choose a tag to compare

musa: fix typo in cc control (#13144)

Signed-off-by: Xiaodong Ye <[email protected]>

b5202

28 Apr 08:34
69699be

Choose a tag to compare

CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (#13137)

b5201

28 Apr 06:11
85f36e5

Choose a tag to compare

arg : fix unused variable (#13142)