-
Notifications
You must be signed in to change notification settings - Fork 5k
Closed
Labels
bugSomething isn't workingSomething isn't workingperformanceCPU and memory usage - results and comparisonsCPU and memory usage - results and comparisons
Description
Not only did we have issues with the benchmark, but I also observed a notable drop in CPU performance for ggml after syncing.
i7-12700H ggml-model-whisper-base.bin OpenBLAS=1 encode time
| 2f52783 (ms/run) | Master (ms/run) |
|---|---|
| 726.22 | 825.36 |
| 745.15 | 788.72 |
| 763.46 | 789.90 |
| 771.82 | 787.58 |
| 757.62 | 845.77 |
| 797.99 | 830.51 |
| 702.00 | 825.29 |
| 722.43 | 808.82 |
| 760.68 | 803.65 |
| 793.12 | 824.25 |
Master i7-12700H OpenBLAS=1 -t 4
64 x 64: Q4_0 2.7 GFLOPS (128 runs) | Q4_1 3.6 GFLOPS (128 runs)
64 x 64: Q5_0 3.6 GFLOPS (128 runs) | Q5_1 3.5 GFLOPS (128 runs) | Q8_0 3.5 GFLOPS (128 runs)
64 x 64: F16 3.5 GFLOPS (128 runs) | F32 3.5 GFLOPS (128 runs)
128 x 128: Q4_0 7.6 GFLOPS (128 runs) | Q4_1 13.5 GFLOPS (128 runs)
128 x 128: Q5_0 13.0 GFLOPS (128 runs) | Q5_1 12.3 GFLOPS (128 runs) | Q8_0 12.9 GFLOPS (128 runs)
128 x 128: F16 19.6 GFLOPS (128 runs) | F32 18.4 GFLOPS (128 runs)
256 x 256: Q4_0 48.3 GFLOPS (128 runs) | Q4_1 40.6 GFLOPS (128 runs)
256 x 256: Q5_0 49.4 GFLOPS (128 runs) | Q5_1 39.5 GFLOPS (128 runs) | Q8_0 16.0 GFLOPS (128 runs)
256 x 256: F16 57.4 GFLOPS (128 runs) | F32 44.6 GFLOPS (128 runs)
512 x 512: Q4_0 97.1 GFLOPS (128 runs) | Q4_1 114.9 GFLOPS (128 runs)
512 x 512: Q5_0 104.6 GFLOPS (128 runs) | Q5_1 113.3 GFLOPS (128 runs) | Q8_0 72.7 GFLOPS (128 runs)
512 x 512: F16 129.7 GFLOPS (128 runs) | F32 105.8 GFLOPS (128 runs)
1024 x 1024: Q4_0 152.5 GFLOPS ( 72 runs) | Q4_1 161.2 GFLOPS ( 76 runs)
1024 x 1024: Q5_0 150.1 GFLOPS ( 70 runs) | Q5_1 157.9 GFLOPS ( 74 runs) | Q8_0 144.5 GFLOPS ( 68 runs)
1024 x 1024: F16 168.0 GFLOPS ( 79 runs) | F32 190.4 GFLOPS ( 89 runs)
2048 x 2048: Q4_0 211.2 GFLOPS ( 13 runs) | Q4_1 232.3 GFLOPS ( 14 runs)
2048 x 2048: Q5_0 210.7 GFLOPS ( 13 runs) | Q5_1 230.4 GFLOPS ( 14 runs) | Q8_0 224.5 GFLOPS ( 14 runs)
2048 x 2048: F16 231.2 GFLOPS ( 14 runs) | F32 238.1 GFLOPS ( 15 runs)
4096 x 4096: Q4_0 328.0 GFLOPS ( 3 runs) | Q4_1 305.7 GFLOPS ( 3 runs)
4096 x 4096: Q5_0 295.3 GFLOPS ( 3 runs) | Q5_1 305.8 GFLOPS ( 3 runs) | Q8_0 292.8 GFLOPS ( 3 runs)
4096 x 4096: F16 308.7 GFLOPS ( 3 runs) | F32 299.2 GFLOPS ( 3 runs)
2f52783 i7-12700H OpenBLAS=1 -t 4
64 x 64: Q5_0 3.9 GFLOPS (128 runs) | Q5_1 3.7 GFLOPS (128 runs) | Q8_0 3.7 GFLOPS (128 runs)
64 x 64: F16 3.5 GFLOPS (128 runs) | F32 2.8 GFLOPS (128 runs)
128 x 128: Q4_0 19.8 GFLOPS (128 runs) | Q4_1 20.3 GFLOPS (128 runs)
128 x 128: Q5_0 19.8 GFLOPS (128 runs) | Q5_1 19.2 GFLOPS (128 runs) | Q8_0 19.4 GFLOPS (128 runs)
128 x 128: F16 22.0 GFLOPS (128 runs) | F32 21.5 GFLOPS (128 runs)
256 x 256: Q4_0 106.8 GFLOPS (128 runs) | Q4_1 103.8 GFLOPS (128 runs)
256 x 256: Q5_0 102.0 GFLOPS (128 runs) | Q5_1 100.6 GFLOPS (128 runs) | Q8_0 107.6 GFLOPS (128 runs)
256 x 256: F16 115.7 GFLOPS (128 runs) | F32 85.3 GFLOPS (128 runs)
512 x 512: Q4_0 137.3 GFLOPS (128 runs) | Q4_1 143.4 GFLOPS (128 runs)
512 x 512: Q5_0 133.7 GFLOPS (128 runs) | Q5_1 132.4 GFLOPS (128 runs) | Q8_0 109.1 GFLOPS (128 runs)
512 x 512: F16 138.5 GFLOPS (128 runs) | F32 101.9 GFLOPS (128 runs)
1024 x 1024: Q4_0 201.7 GFLOPS ( 94 runs) | Q4_1 194.3 GFLOPS ( 91 runs)
1024 x 1024: Q5_0 172.8 GFLOPS ( 81 runs) | Q5_1 176.0 GFLOPS ( 83 runs) | Q8_0 167.9 GFLOPS ( 79 runs)
1024 x 1024: F16 189.0 GFLOPS ( 89 runs) | F32 142.1 GFLOPS ( 67 runs)
2048 x 2048: Q4_0 316.3 GFLOPS ( 19 runs) | Q4_1 320.2 GFLOPS ( 19 runs)
2048 x 2048: Q5_0 303.9 GFLOPS ( 18 runs) | Q5_1 299.2 GFLOPS ( 18 runs) | Q8_0 303.3 GFLOPS ( 18 runs)
2048 x 2048: F16 297.9 GFLOPS ( 18 runs) | F32 240.5 GFLOPS ( 14 runs)
4096 x 4096: Q4_0 368.8 GFLOPS ( 3 runs) | Q4_1 364.6 GFLOPS ( 3 runs)
4096 x 4096: Q5_0 391.0 GFLOPS ( 3 runs) | Q5_1 341.6 GFLOPS ( 3 runs) | Q8_0 372.5 GFLOPS ( 3 runs)
4096 x 4096: F16 344.3 GFLOPS ( 3 runs) | F32 345.3 GFLOPS ( 3 runs)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingperformanceCPU and memory usage - results and comparisonsCPU and memory usage - results and comparisons