Skip to content

ggml : add GPU support for Mamba models #269

@jakexcosme

Description

@jakexcosme

Note: This issue was copied from ggml-org#6758

Original Author: @ggerganov
Original Issue Number: ggml-org#6758
Created: 2024-04-19T06:47:35Z


Recently, initial Mamba support (CPU-only) has been introduced in ggml-org#5328 by @compilade

In order to support running these models efficiently on the GPU, we seem to be lacking kernel implementations for the following 2 ops:

  • GGML_OP_SSM_CONV
  • GGML_OP_SSM_SCAN

Creating this issue to keep track of this and give more visibility of this feature. Help with implementing the missing kernels for CUDA and Metal (and other backends potentially) is welcome. We can also discuss if anything else is required to better support this architecture in llama.cpp

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions