Commit d3372f4
kv-cache : add SWA support (ggml-org#13194)
* kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci1 parent cfc18f6 commit d3372f4
File tree
15 files changed
+1414
-638
lines changed- common
- include
- src
- tools
- llama-bench
- server
15 files changed
+1414
-638
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1445 | 1445 | | |
1446 | 1446 | | |
1447 | 1447 | | |
| 1448 | + | |
| 1449 | + | |
| 1450 | + | |
| 1451 | + | |
| 1452 | + | |
| 1453 | + | |
| 1454 | + | |
| 1455 | + | |
1448 | 1456 | | |
1449 | 1457 | | |
1450 | 1458 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1136 | 1136 | | |
1137 | 1137 | | |
1138 | 1138 | | |
| 1139 | + | |
1139 | 1140 | | |
1140 | 1141 | | |
1141 | 1142 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
323 | 323 | | |
324 | 324 | | |
325 | 325 | | |
| 326 | + | |
326 | 327 | | |
327 | 328 | | |
328 | 329 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
361 | 361 | | |
362 | 362 | | |
363 | 363 | | |
364 | | - | |
365 | | - | |
366 | | - | |
367 | | - | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
368 | 369 | | |
369 | 370 | | |
370 | 371 | | |
| |||
730 | 731 | | |
731 | 732 | | |
732 | 733 | | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
733 | 741 | | |
| 742 | + | |
734 | 743 | | |
735 | 744 | | |
736 | | - | |
| 745 | + | |
737 | 746 | | |
738 | 747 | | |
739 | 748 | | |
| |||
943 | 952 | | |
944 | 953 | | |
945 | 954 | | |
946 | | - | |
947 | | - | |
948 | | - | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
949 | 961 | | |
950 | 962 | | |
951 | 963 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
| 96 | + | |
96 | 97 | | |
97 | 98 | | |
98 | 99 | | |
| |||
176 | 177 | | |
177 | 178 | | |
178 | 179 | | |
179 | | - | |
180 | | - | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
181 | 183 | | |
182 | 184 | | |
183 | 185 | | |
| |||
947 | 949 | | |
948 | 950 | | |
949 | 951 | | |
950 | | - | |
951 | | - | |
952 | 952 | | |
953 | 953 | | |
954 | 954 | | |
| |||
2093 | 2093 | | |
2094 | 2094 | | |
2095 | 2095 | | |
| 2096 | + | |
2096 | 2097 | | |
2097 | 2098 | | |
2098 | 2099 | | |
| |||
2467 | 2468 | | |
2468 | 2469 | | |
2469 | 2470 | | |
| 2471 | + | |
| 2472 | + | |
| 2473 | + | |
| 2474 | + | |
| 2475 | + | |
| 2476 | + | |
| 2477 | + | |
| 2478 | + | |
| 2479 | + | |
2470 | 2480 | | |
2471 | 2481 | | |
2472 | 2482 | | |
| |||
2475 | 2485 | | |
2476 | 2486 | | |
2477 | 2487 | | |
2478 | | - | |
| 2488 | + | |
2479 | 2489 | | |
2480 | 2490 | | |
2481 | 2491 | | |
| |||
2637 | 2647 | | |
2638 | 2648 | | |
2639 | 2649 | | |
2640 | | - | |
| 2650 | + | |
| 2651 | + | |
| 2652 | + | |
| 2653 | + | |
| 2654 | + | |
| 2655 | + | |
| 2656 | + | |
| 2657 | + | |
| 2658 | + | |
| 2659 | + | |
| 2660 | + | |
| 2661 | + | |
| 2662 | + | |
| 2663 | + | |
| 2664 | + | |
2641 | 2665 | | |
2642 | 2666 | | |
2643 | 2667 | | |
| |||
0 commit comments