-
Notifications
You must be signed in to change notification settings - Fork 13.1k
Closed
Labels
Description
Currently, when the context becomes full, we pick part of the tokens and recompute the KV cache.
Instead, try to either:
- store non-RoPEd KV cache, "shift" it when the context is full and compute the RoPE over the entire cache for every new token taking into account the current positions
- store RoPEd KV cache (as we do now), "shift" it when the context is full and apply extra shift-RoPE on it (assuming RoPE is "additive")
FNsi, lin72h, xynta, FSSRepo and SchaltfehlerLostRuins, grencez, TheSeamau5 and Nexesenex