Skip to content

Releases: ngxson/llama.cpp

b6759

14 Oct 11:35
1ee9d0b

Choose a tag to compare

CUDA: use fastdiv + ggml_cuda_mad for mmvf (#16557)

* CUDA: use fastdiv + ggml_cuda_mad for mmvf

* use bf16 directly + fix formatting

* Add exception for HIP code

b6757

14 Oct 11:01
5b6913c

Choose a tag to compare

cuda : remove legacy copy-op pointer indirection code (#16485)

* remove legacy copy-op pointer indirection code

* further removal of copy-op indirection code

* renamed check_node_graph_compatibility_and_refresh_copy_ops function

b6756

14 Oct 06:07
bc07349

Choose a tag to compare

server : dynamic token limit for prompt cache (#16560)

* server : dynamic token limit for prompt cache

* cont : print estimated token limit

b6754

13 Oct 20:01
e38b7c6

Choose a tag to compare

graph : support cacheless embeddings with FA and iSWA (#16528)

* graph : support cacheless embeddings with FA and iSWA

* cont : deduplicate mask creation

* cont : fix name

b6753

13 Oct 19:21
5016b72

Choose a tag to compare

opencl: fix build targeting CL 2 (#16554)

b6752

13 Oct 14:58
7049736

Choose a tag to compare

CUDA: fix numerical issues in tile FA kernel (#16540)

b6751

13 Oct 13:16
01d2bdc

Choose a tag to compare

ggml : fix build broken with -march=armv9-a on MacOS (#16520)

* ggml : fix build broken with -march=armv9-a on MacOS

Signed-off-by: Jie Fu <[email protected]>

* Add #pragma message

Signed-off-by: Jie Fu <[email protected]>

* Address review comment.

Signed-off-by: Jie Fu <[email protected]>

* Update ggml/src/ggml-cpu/ggml-cpu.c

---------

Signed-off-by: Jie Fu <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>

b6750

13 Oct 09:32
56fc38b

Choose a tag to compare

CANN: fix CPU memory leak in CANN backend (#16549)

This commit fixes a CPU-side memory leak issue in the CANN backend,
which occurred when intermediate aclTensorList objects were not properly
released after operator execution. The leak happened during repeated
invocations of CANN ops (e.g., FlashAttention), leading to increasing
host memory usage over time.

Proper resource cleanup (aclDestroyTensorList and related release logic)
has been added to ensure that all temporary tensors are correctly freed.

b6748

13 Oct 09:03
3f750f8

Choose a tag to compare

metal: add support for opt_step_sgd (#16539)

* metal: add support for opt_step_sgd

* add newline to pass EditorConfig check

b6747

13 Oct 09:00
c515fc5

Choose a tag to compare

ggml : fix scalar path for computing norm (#16558)