-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[V1][Hybrid] GatedDeltaNet Automatic Prefix Caching #26807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
simondanielsson
wants to merge
27
commits into
vllm-project:main
Choose a base branch
from
simondanielsson:feature/gdn-apc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+704
−32
Open
Changes from 19 commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
0860de4
First working version
simondanielsson 7b41ac4
Merge remote-tracking branch 'upstream/main' into feature/gdn-apc
simondanielsson 538c9a0
Update type hints in gdn_attn
simondanielsson 3fffae0
[DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttenti…
FENP 76ac0fa
Enable cudagraphs support [skip ci]
simondanielsson 1d3afe0
Merge remote-tracking branch 'upstream/main' into feature/gdn-apc
simondanielsson 795ed51
Fix long() -> long [skip ci]
simondanielsson 044990c
Add defensive programming asserts
simondanielsson 68ca70f
Allocate metadata buffer by chunk count rather than block count, and …
simondanielsson fe8f0b7
Return hidden state when return_intermediate_states is passed, ignori…
simondanielsson ac226e8
Inline _reshape_intermediate_states in the fla chunk kernel wrapper
simondanielsson f975260
Add more explanatory comments in FLA's chunk.py
simondanielsson e74f67d
Improve logging
simondanielsson f177a1f
Add GDN model to APC tests
simondanielsson 552ba6f
Add helpful comments in hard-to-understand areas
simondanielsson 30b1ea0
Merge remote-tracking branch 'upstream/main' into feature/gdn-apc
simondanielsson 2ab062d
Improve way to set chunk_size=64 for GDN
simondanielsson 4837a11
Revert KV cache memory limit in test
simondanielsson 3a88844
Merge remote-tracking branch 'upstream/main' into feature/gdn-apc
simondanielsson b58362a
Add dynamic counting of decode chunks, rather than static value
simondanielsson ccda04e
Add plot
simondanielsson 03aa33c
Remove plot
simondanielsson 9896ba4
Merge remote-tracking branch 'upstream/main' into feature/gdn-apc
simondanielsson 46406f1
Remove extra trailing comma
simondanielsson dbb4fe3
Move hardcoded chunk size to GDN attn metadata builder
simondanielsson efd451b
Remove extra newline
simondanielsson bfa6ffc
Merge remote-tracking branch 'upstream/main' into feature/gdn-apc
simondanielsson File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.