Skip to content

Releases: ngxson/llama.cpp

b5190

25 Apr 13:33
558a764

Choose a tag to compare

Force FP32 compute in GLM4 FFN Down (#13101)

* Force FP32 compute in cuBLAS GEMM

* Revert "Force FP32 compute in cuBLAS GEMM"

This reverts commit 6efd872732159ab88ee7b3c1d77ba5ebc83079bd.

* Force F32 compute in GLM4 ffn down

* Edit comment to clarify issue

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

b5189

25 Apr 13:24
edb18b6

Choose a tag to compare

clip : fix pixtral on some GPU backends (#13097)

* clip : fix pixtral on some GPU backends

* refactor inp_raw set

* rm outdated comment

* fix dynamic size

* add TODO

b5188

25 Apr 10:27
514c456

Choose a tag to compare

change the reorder tensor from init to execute OP (#13003)

b5187

25 Apr 07:59
553a5c3

Choose a tag to compare

rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (#12943)

RPC_CMD_SET_TENSOR always returns an empty response and we send this 4
times per token. We can improve TG speed if we don't wait for this empty
response.

The performance impact of this change depends on the network latency.

b5186

24 Apr 21:01
13be08d

Choose a tag to compare

clip : remove boi/eoi embeddings for GLM-edge model (#13081)

b5185

24 Apr 20:17
226251e

Choose a tag to compare

embeddings : fix batch sizes (#13076)

ggml-ci

b5184

24 Apr 15:27

Choose a tag to compare

ggml : fix trailing whitespaces (#0)

b5181

24 Apr 14:54
b10d8bf

Choose a tag to compare

CUDA: use switch statements in constexpr functions (#13095)

b5180

24 Apr 13:45
13b4548

Choose a tag to compare

cmake : do not include ./src as public for libllama (#13062)

* cmake : do not include ./src as public for libllama

ggml-ci

* cmake : rework tests

ggml-ci

* llguidance : remove unicode include

ggml-ci

* cmake : make c++17 private

ggml-ci

b5178

24 Apr 12:46
7c727fb

Choose a tag to compare

arg : add --no-mmproj-offload (#13093)

* arg : add --no-mmproj-offload

* Update common/arg.cpp