Releases · ngxson/llama.cpp

14 Oct 11:35

1ee9d0b

b6759

CUDA: use fastdiv + ggml_cuda_mad for mmvf (#16557)

* CUDA: use fastdiv + ggml_cuda_mad for mmvf

* use bf16 directly + fix formatting

* Add exception for HIP code

Assets 15

14 Oct 11:01

github-actions

b6757

5b6913c

b6757

cuda : remove legacy copy-op pointer indirection code (#16485)

* remove legacy copy-op pointer indirection code

* further removal of copy-op indirection code

* renamed check_node_graph_compatibility_and_refresh_copy_ops function

Assets 15

14 Oct 06:07

github-actions

b6756

bc07349

b6756

server : dynamic token limit for prompt cache (#16560)

* server : dynamic token limit for prompt cache

* cont : print estimated token limit

Assets 15

13 Oct 20:01

github-actions

b6754

e38b7c6

b6754

graph : support cacheless embeddings with FA and iSWA (#16528)

* graph : support cacheless embeddings with FA and iSWA

* cont : deduplicate mask creation

* cont : fix name

Assets 15

13 Oct 19:21

github-actions

b6753

5016b72

b6753

opencl: fix build targeting CL 2 (#16554)

Assets 15

13 Oct 14:58

github-actions

b6752

7049736

b6752

CUDA: fix numerical issues in tile FA kernel (#16540)

Assets 15

13 Oct 13:16

github-actions

b6751

01d2bdc

b6751

ggml : fix build broken with -march=armv9-a on MacOS (#16520)

* ggml : fix build broken with -march=armv9-a on MacOS

Signed-off-by: Jie Fu <[email protected]>

* Add #pragma message

Signed-off-by: Jie Fu <[email protected]>

* Address review comment.

Signed-off-by: Jie Fu <[email protected]>

* Update ggml/src/ggml-cpu/ggml-cpu.c

---------

Signed-off-by: Jie Fu <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>

Assets 15

13 Oct 09:32

github-actions

b6750

56fc38b

b6750

CANN: fix CPU memory leak in CANN backend (#16549)

This commit fixes a CPU-side memory leak issue in the CANN backend,
which occurred when intermediate aclTensorList objects were not properly
released after operator execution. The leak happened during repeated
invocations of CANN ops (e.g., FlashAttention), leading to increasing
host memory usage over time.

Proper resource cleanup (aclDestroyTensorList and related release logic)
has been added to ensure that all temporary tensors are correctly freed.

Assets 15

13 Oct 09:03

github-actions

b6748

3f750f8

b6748

metal: add support for opt_step_sgd (#16539)

* metal: add support for opt_step_sgd

* add newline to pass EditorConfig check

Assets 15

13 Oct 09:00

github-actions

b6747

c515fc5

b6747

ggml : fix scalar path for computing norm (#16558)

Assets 15

Uh oh!

Releases: ngxson/llama.cpp

b6759

Uh oh!

b6757

Uh oh!

b6756

Uh oh!

b6754

Uh oh!

b6753

Uh oh!

b6752

Uh oh!

b6751

Uh oh!

b6750

Uh oh!

b6748

Uh oh!

b6747

Uh oh!