Releases · ngxson/llama.cpp

25 Apr 13:33

558a764

b5190

Force FP32 compute in GLM4 FFN Down (#13101)

* Force FP32 compute in cuBLAS GEMM

* Revert "Force FP32 compute in cuBLAS GEMM"

This reverts commit 6efd872732159ab88ee7b3c1d77ba5ebc83079bd.

* Force F32 compute in GLM4 ffn down

* Edit comment to clarify issue

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

Assets 26

25 Apr 13:24

github-actions

b5189

edb18b6

b5189

clip : fix pixtral on some GPU backends (#13097)

* clip : fix pixtral on some GPU backends

* refactor inp_raw set

* rm outdated comment

* fix dynamic size

* add TODO

Assets 26

25 Apr 10:27

github-actions

b5188

514c456

b5188

change the reorder tensor from init to execute OP (#13003)

Assets 26

25 Apr 07:59

github-actions

b5187

553a5c3

b5187

rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (#12943)

RPC_CMD_SET_TENSOR always returns an empty response and we send this 4
times per token. We can improve TG speed if we don't wait for this empty
response.

The performance impact of this change depends on the network latency.

Assets 26

24 Apr 21:01

github-actions

b5186

13be08d

b5186

clip : remove boi/eoi embeddings for GLM-edge model (#13081)

Assets 26

24 Apr 20:17

github-actions

b5185

226251e

b5185

embeddings : fix batch sizes (#13076)

ggml-ci

Assets 26

24 Apr 15:27

github-actions

b5184

87616f0

b5184

ggml : fix trailing whitespaces (#0)

Assets 26

24 Apr 14:54

github-actions

b5181

b10d8bf

b5181

CUDA: use switch statements in constexpr functions (#13095)

Assets 26

24 Apr 13:45

github-actions

b5180

13b4548

b5180

cmake : do not include ./src as public for libllama (#13062)

* cmake : do not include ./src as public for libllama

ggml-ci

* cmake : rework tests

ggml-ci

* llguidance : remove unicode include

ggml-ci

* cmake : make c++17 private

ggml-ci

Assets 26

24 Apr 12:46

github-actions

b5178

7c727fb

b5178

arg : add --no-mmproj-offload (#13093)

* arg : add --no-mmproj-offload

* Update common/arg.cpp

Assets 26

Uh oh!

Releases: ngxson/llama.cpp

b5190

Uh oh!

b5189

Uh oh!

b5188

Uh oh!

b5187

Uh oh!

b5186

Uh oh!

b5185

Uh oh!

b5184

Uh oh!

b5181

Uh oh!

b5180

Uh oh!

b5178

Uh oh!