Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b6759
CUDA: use fastdiv + ggml_cuda_mad for mmvf (#16557) * CUDA: use fastdiv + ggml_cuda_mad for mmvf * use bf16 directly + fix formatting * Add exception for HIP code
b6757
cuda : remove legacy copy-op pointer indirection code (#16485) * remove legacy copy-op pointer indirection code * further removal of copy-op indirection code * renamed check_node_graph_compatibility_and_refresh_copy_ops function
b6756
server : dynamic token limit for prompt cache (#16560) * server : dynamic token limit for prompt cache * cont : print estimated token limit
b6754
graph : support cacheless embeddings with FA and iSWA (#16528) * graph : support cacheless embeddings with FA and iSWA * cont : deduplicate mask creation * cont : fix name
b6753
opencl: fix build targeting CL 2 (#16554)
b6752
CUDA: fix numerical issues in tile FA kernel (#16540)
b6751
ggml : fix build broken with -march=armv9-a on MacOS (#16520) * ggml : fix build broken with -march=armv9-a on MacOS Signed-off-by: Jie Fu <[email protected]> * Add #pragma message Signed-off-by: Jie Fu <[email protected]> * Address review comment. Signed-off-by: Jie Fu <[email protected]> * Update ggml/src/ggml-cpu/ggml-cpu.c --------- Signed-off-by: Jie Fu <[email protected]> Co-authored-by: Diego Devesa <[email protected]>
b6750
CANN: fix CPU memory leak in CANN backend (#16549) This commit fixes a CPU-side memory leak issue in the CANN backend, which occurred when intermediate aclTensorList objects were not properly released after operator execution. The leak happened during repeated invocations of CANN ops (e.g., FlashAttention), leading to increasing host memory usage over time. Proper resource cleanup (aclDestroyTensorList and related release logic) has been added to ensure that all temporary tensors are correctly freed.
b6748
metal: add support for opt_step_sgd (#16539) * metal: add support for opt_step_sgd * add newline to pass EditorConfig check
b6747
ggml : fix scalar path for computing norm (#16558)