-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Description
In your paper, you mentioned that mamba scan is faster than flashattention2.
Does it mean comparing
class SelectiveScanFn(torch.autograd.Function): |
The inputs of these two modules are different, is this comparation fair? Or the preprocessing(compute q, k, v in flashattention; compute A,B,C,D,delta in mamba scan) need to be be taken into account?
Metadata
Metadata
Assignees
Labels
No labels