forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
Nvidia GPUenhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is neededroadmap
Description
Note: This issue was copied from ggml-org#6758
Original Author: @ggerganov
Original Issue Number: ggml-org#6758
Created: 2024-04-19T06:47:35Z
Recently, initial Mamba support (CPU-only) has been introduced in ggml-org#5328 by @compilade
In order to support running these models efficiently on the GPU, we seem to be lacking kernel implementations for the following 2 ops:
GGML_OP_SSM_CONVGGML_OP_SSM_SCAN
Creating this issue to keep track of this and give more visibility of this feature. Help with implementing the missing kernels for CUDA and Metal (and other backends potentially) is welcome. We can also discuss if anything else is required to better support this architecture in llama.cpp
Metadata
Metadata
Assignees
Labels
Nvidia GPUenhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is neededroadmap