Releases: l3utterfly/llama.cpp
Releases · l3utterfly/llama.cpp
b6368
b6123
cuda: refactored ssm_scan and use CUB (#13291) * cuda: refactored ssm_scan to use CUB * fixed compilation error when when not using CUB * assign L to constant and use size_t instead of int * deduplicated functions * change min blocks per mp to 1 * Use cub load and store warp transpose * suppress clang warning
b6029
embeddings: fix extraction of CLS pooling results (#14927) * embeddings: fix extraction of CLS pooling results * merge RANK pooling into CLS case for inputs
b5891
llama : add jinja template for rwkv-world (#14665) * llama : add jinja template for rwkv-world Signed-off-by: Molly Sophia <[email protected]> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>
b5871
readme : add hot PRs (#14636) * readme : add hot PRs * cont * readme : update title * readme : hot PRs links * cont
b5835
vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) (#14485) Commit taken from remyoudompheng's PR https://github.com/ggml-org/llama.cpp/pull/12260 Co-authored-by: Rémy Oudompheng <[email protected]>
b5581
opencl: add `backend_synchronize` (#13939) * This is not needed by the normal use where the result is read using `tensor_get`, but it allows perf mode of `test-backend-ops` to properly measure performance.
b5416
CANN: Support MOE Model MUL_MAT_ID (#13042) Signed-off-by: noemotiovon <[email protected]>
b5158
Disable CI cross-compile builds (#13022)
b5061
musa: fix compilation warnings in mp_22/31 (#12780) Signed-off-by: Xiaodong Ye <[email protected]>