- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.9k
Closed
Description
First of all: CONGRATS ON YOUR AMAZING RESEARCH WORK.
Considering that this is using GGML and seems based directly on llama.cpp:
Why is this a separate project to llama.cpp, given that llama.cpp already supports BitNet ternary quants? (ggml-org/llama.cpp#8151)
Are these simply more optimised kernels?
If so, how do they compare to llama's implementation?
Can/should they be contributed back to llama.cpp?
sadityakumar9211, saattrupdan, vTuanpham, gingerly, Maaarcocr and 12 morey-vectorfieldZipingL, stan4cb, jaycc3000 and LeoX91AgainstEntropy, saattrupdan, RobinBially, vTuanpham, dev-cj and 13 more
Metadata
Metadata
Assignees
Labels
No labels