Skip to content

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Nov 29, 2024

Improve Metal performance for small batch sizes (<= 8)

make -j && ./bin/llama-bench -m ../models/qwen2.5-3b-coder/ggml-model-q4_0.gguf -m ../models/qwen2.5-3b-coder/ggml-model-q4_1.gguf -m ../models/qwen2.5-3b-coder/ggml-model-q4_k.gguf -m ../models/qwen2.5-3b-coder/ggml-model-q5_k.gguf -m ../models/qwen2.5-3b-coder/ggml-model-q6_k.gguf -m ../models/qwen2.5-3b-coder/ggml-model-iq4_nl.gguf -m ../models/qwen2.5-3b-coder/ggml-model-q8_0.gguf -m ../models/qwen2.5-3b-coder/ggml-model-f16.gguf -m ../models/qwen2.5-7b-coder/ggml-model-q4_0.gguf -m ../models/qwen2.5-7b-coder/ggml-model-q4_1.gguf -m ../models/qwen2.5-7b-coder/ggml-model-q4_k.gguf -m ../models/qwen2.5-7b-coder/ggml-model-q5_k.gguf -m ../models/qwen2.5-7b-coder/ggml-model-q6_k.gguf -m ../models/qwen2.5-7b-coder/ggml-model-iq4_nl.gguf -m ../models/qwen2.5-7b-coder/ggml-model-q8_0.gguf -m ../models/qwen2.5-7b-coder/ggml-model-f16.gguf -m ../models/qwen2.5-14b-coder/ggml-model-q4_0.gguf -m ../models/qwen2.5-14b-coder/ggml-model-q4_1.gguf -m ../models/qwen2.5-14b-coder/ggml-model-q4_k.gguf -m ../models/qwen2.5-14b-coder/ggml-model-q5_k.gguf -m ../models/qwen2.5-14b-coder/ggml-model-q6_k.gguf -m ../models/qwen2.5-14b-coder/ggml-model-iq4_nl.gguf -m ../models/qwen2.5-14b-coder/ggml-model-q8_0.gguf -m ../models/qwen2.5-14b-coder/ggml-model-f16.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-q4_0.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-q4_1.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-q4_k.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-q5_k.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-q6_k.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-iq4_nl.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-q8_0.gguf -m ../models/qwen2.5-32b-coder-instruct/ggml-model-f16.gguf -p 1,2,3,4,5,6,7,8 -n 0 -fa 1 -t 1
M2 Ultra
model size backend fa test t/s t/s speedup
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp1 103.51 ± 0.44 101.58 ± 3.72 0.98
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp2 148.82 ± 4.18 141.95 ± 4.29 0.95
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp3 181.31 ± 2.40 173.71 ± 4.32 0.96
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp4 195.01 ± 5.86 201.25 ± 1.53 1.03
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp5 91.39 ± 0.28 209.06 ± 4.30 2.29
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp6 110.09 ± 1.10 223.83 ± 3.34 2.03
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp7 128.26 ± 1.98 229.85 ± 4.12 1.79
qwen2 7B Q4_0 4.12 GiB Metal,BLAS 1 pp8 144.31 ± 1.91 260.05 ± 3.82 1.80
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp1 100.27 ± 0.40 98.12 ± 2.25 0.98
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp2 150.89 ± 1.07 145.80 ± 0.34 0.97
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp3 184.53 ± 0.53 177.67 ± 2.02 0.96
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp4 198.12 ± 0.45 205.09 ± 0.89 1.04
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp5 93.77 ± 0.07 215.70 ± 1.41 2.30
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp6 112.26 ± 0.16 229.39 ± 1.35 2.04
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp7 130.85 ± 0.10 235.22 ± 1.83 1.80
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp8 149.93 ± 0.15 269.63 ± 1.84 1.80
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp1 91.69 ± 1.38 86.70 ± 2.47 0.95
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp2 122.44 ± 0.16 121.30 ± 1.36 0.99
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp3 142.49 ± 0.17 141.52 ± 0.76 0.99
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp4 151.12 ± 0.22 156.37 ± 1.58 1.03
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp5 79.39 ± 0.04 177.85 ± 1.30 2.24
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp6 95.17 ± 0.25 179.18 ± 1.31 1.88
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp7 111.06 ± 0.34 175.66 ± 1.04 1.58
qwen2 7B Q4_K 4.36 GiB Metal,BLAS 1 pp8 126.79 ± 0.12 198.07 ± 1.00 1.56
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp1 72.53 ± 0.23 69.08 ± 2.89 0.95
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp2 93.73 ± 0.28 92.89 ± 0.93 0.99
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp3 104.63 ± 0.26 103.75 ± 0.60 0.99
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp4 108.82 ± 0.17 146.24 ± 0.52 1.34
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp5 73.65 ± 0.09 167.83 ± 0.84 2.28
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp6 88.14 ± 0.14 165.03 ± 0.61 1.87
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp7 102.66 ± 0.12 162.18 ± 0.68 1.58
qwen2 7B Q5_K 5.07 GiB Metal,BLAS 1 pp8 117.24 ± 0.15 182.82 ± 0.85 1.56
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp1 78.30 ± 0.16 76.63 ± 2.31 0.98
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp2 101.74 ± 0.13 101.38 ± 1.12 1.00
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp3 114.76 ± 0.23 113.90 ± 0.71 0.99
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp4 118.91 ± 0.09 153.46 ± 1.50 1.29
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp5 75.34 ± 0.05 172.64 ± 0.95 2.29
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp6 90.45 ± 0.20 170.42 ± 0.99 1.88
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp7 105.34 ± 0.18 168.04 ± 0.97 1.60
qwen2 7B Q6_K 5.82 GiB Metal,BLAS 1 pp8 120.47 ± 0.14 186.88 ± 0.66 1.55
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp1 87.77 ± 0.42 84.21 ± 1.19 0.96
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp2 112.54 ± 0.15 124.63 ± 1.17 1.11
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp3 130.12 ± 0.40 141.64 ± 0.74 1.09
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp4 139.27 ± 0.45 158.47 ± 1.28 1.14
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp5 81.86 ± 0.15 185.89 ± 0.69 2.27
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp6 97.98 ± 0.11 182.10 ± 0.21 1.86
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp7 113.93 ± 0.08 178.31 ± 0.21 1.57
qwen2 7B IQ4_NL 4.15 GiB Metal,BLAS 1 pp8 130.36 ± 0.46 205.57 ± 0.37 1.58
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp1 69.44 ± 1.83 70.41 ± 0.45 1.01
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp2 96.25 ± 2.72 126.66 ± 0.20 1.32
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp3 111.80 ± 2.43 163.08 ± 0.68 1.46
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp4 118.85 ± 0.49 200.40 ± 0.41 1.69
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp5 89.39 ± 0.62 207.24 ± 0.26 2.32
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp6 107.13 ± 0.62 210.91 ± 0.39 1.97
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp7 124.46 ± 1.41 229.20 ± 0.05 1.84
qwen2 7B Q8_0 7.54 GiB Metal,BLAS 1 pp8 142.26 ± 1.47 260.76 ± 0.27 1.83
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp1 39.48 ± 0.55 39.50 ± 0.67 1.00
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp2 49.12 ± 0.69 83.67 ± 1.24 1.70
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp3 52.16 ± 0.34 117.29 ± 2.09 2.25
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp4 60.95 ± 0.43 149.43 ± 2.68 2.45
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp5 90.45 ± 0.11 174.40 ± 3.87 1.93
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp6 108.13 ± 0.28 150.62 ± 3.31 1.39
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp7 125.99 ± 1.45 170.60 ± 3.20 1.35
qwen2 7B F16 14.19 GiB Metal,BLAS 1 pp8 143.42 ± 2.57 193.36 ± 3.40 1.35
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp1 55.74 ± 0.18 55.56 ± 0.18 1.00
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp2 81.41 ± 0.11 76.23 ± 0.17 0.94
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp3 99.10 ± 0.15 96.47 ± 0.24 0.97
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp4 106.32 ± 0.08 112.64 ± 0.36 1.06
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp5 48.50 ± 0.06 116.69 ± 0.11 2.41
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp6 58.17 ± 0.17 125.42 ± 0.17 2.16
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp7 67.79 ± 0.16 128.60 ± 0.12 1.90
qwen2 14B Q4_0 7.93 GiB Metal,BLAS 1 pp8 77.46 ± 0.22 146.78 ± 0.17 1.89
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp1 100.01 ± 0.31 99.96 ± 0.51 1.00
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp2 150.73 ± 0.50 146.20 ± 0.33 0.97
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp3 184.06 ± 0.62 178.79 ± 0.29 0.97
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp4 198.40 ± 0.43 206.25 ± 0.51 1.04
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp5 94.08 ± 0.06 217.42 ± 0.88 2.31
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp6 112.73 ± 0.06 230.23 ± 0.53 2.04
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp7 130.81 ± 0.05 236.16 ± 0.16 1.81
qwen2 7B Q4_1 4.53 GiB Metal,BLAS 1 pp8 149.56 ± 0.14 270.31 ± 0.35 1.81
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp1 49.75 ± 0.79 49.94 ± 0.19 1.00
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp2 67.93 ± 0.15 67.93 ± 0.15 1.00
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp3 78.51 ± 0.16 78.38 ± 0.14 1.00
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp4 83.13 ± 0.42 82.00 ± 0.12 0.99
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp5 40.19 ± 0.13 89.64 ± 0.04 2.23
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp6 48.05 ± 0.04 95.06 ± 0.13 1.98
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp7 55.96 ± 0.06 90.80 ± 0.07 1.62
qwen2 14B Q4_K 8.37 GiB Metal,BLAS 1 pp8 63.98 ± 0.06 101.02 ± 0.17 1.58
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp1 40.31 ± 0.42 40.25 ± 0.20 1.00
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp2 52.07 ± 0.04 51.84 ± 0.14 1.00
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp3 56.94 ± 0.34 56.77 ± 0.35 1.00
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp4 58.97 ± 0.08 77.26 ± 0.12 1.31
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp5 36.78 ± 0.06 86.95 ± 0.09 2.36
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp6 44.11 ± 0.10 88.79 ± 0.22 2.01
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp7 51.27 ± 0.07 86.39 ± 0.16 1.69
qwen2 14B Q5_K 9.78 GiB Metal,BLAS 1 pp8 58.59 ± 0.05 97.49 ± 0.10 1.66
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp1 42.24 ± 0.17 42.06 ± 0.15 1.00
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp2 52.53 ± 0.03 52.47 ± 0.12 1.00
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp3 57.91 ± 0.04 57.70 ± 0.04 1.00
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp4 59.81 ± 0.08 79.01 ± 0.09 1.32
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp5 38.59 ± 0.01 87.93 ± 0.10 2.28
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp6 46.36 ± 0.10 91.72 ± 0.29 1.98
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp7 53.99 ± 0.14 88.80 ± 0.15 1.64
qwen2 14B Q6_K 11.29 GiB Metal,BLAS 1 pp8 61.65 ± 0.10 97.00 ± 0.24 1.57
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp1 43.42 ± 0.15 43.43 ± 0.08 1.00
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp2 59.08 ± 0.12 64.13 ± 0.11 1.09
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp3 67.03 ± 0.07 73.73 ± 0.17 1.10
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp4 71.00 ± 0.18 82.61 ± 0.29 1.16
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp5 42.53 ± 0.06 97.26 ± 0.15 2.29
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp6 50.93 ± 0.06 92.80 ± 0.28 1.82
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp7 59.31 ± 0.12 88.06 ± 0.49 1.48
qwen2 14B IQ4_NL 8.01 GiB Metal,BLAS 1 pp8 67.61 ± 0.03 100.83 ± 0.58 1.49
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp1 35.42 ± 0.65 35.47 ± 0.43 1.00
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp2 48.77 ± 0.65 66.12 ± 0.27 1.36
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp3 54.75 ± 0.47 86.90 ± 0.78 1.59
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp4 57.09 ± 0.09 105.82 ± 1.47 1.85
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp5 44.95 ± 0.11 109.84 ± 1.94 2.44
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp6 53.93 ± 0.45 109.45 ± 0.98 2.03
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp7 62.50 ± 0.20 118.56 ± 1.76 1.90
qwen2 14B Q8_0 14.62 GiB Metal,BLAS 1 pp8 71.09 ± 0.24 133.41 ± 1.53 1.88
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp1 21.04 ± 0.10 21.07 ± 0.07 1.00
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp2 25.01 ± 0.05 43.02 ± 0.41 1.72
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp3 27.19 ± 0.04 60.72 ± 0.20 2.23
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp4 32.91 ± 0.05 74.23 ± 2.22 2.26
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp5 48.01 ± 0.31 89.55 ± 0.49 1.87
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp6 57.61 ± 0.53 75.01 ± 0.49 1.30
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp7 66.53 ± 0.14 85.14 ± 0.65 1.28
qwen2 14B F16 27.51 GiB Metal,BLAS 1 pp8 76.36 ± 0.63 97.31 ± 0.97 1.27
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp1 29.25 ± 0.05 29.20 ± 0.06 1.00
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp2 37.11 ± 0.07 37.93 ± 0.05 1.02
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp3 42.27 ± 0.03 50.55 ± 0.04 1.20
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp4 44.16 ± 0.08 58.00 ± 0.08 1.31
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp5 25.01 ± 0.03 60.65 ± 0.05 2.43
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp6 29.97 ± 0.03 60.26 ± 0.11 2.01
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp7 34.86 ± 0.01 60.86 ± 0.05 1.75
qwen2 32B Q4_0 17.35 GiB Metal,BLAS 1 pp8 39.79 ± 0.02 69.41 ± 0.09 1.74
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp1 27.03 ± 0.06 27.03 ± 0.06 1.00
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp2 34.15 ± 0.05 37.19 ± 0.04 1.09
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp3 38.02 ± 0.03 49.71 ± 0.22 1.31
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp4 39.36 ± 0.03 57.54 ± 0.11 1.46
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp5 24.96 ± 0.04 59.83 ± 0.07 2.40
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp6 29.96 ± 0.05 59.29 ± 0.01 1.98
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp7 34.83 ± 0.05 59.96 ± 0.12 1.72
qwen2 32B Q4_1 19.22 GiB Metal,BLAS 1 pp8 39.83 ± 0.04 68.54 ± 0.08 1.72
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp1 25.89 ± 0.06 25.94 ± 0.06 1.00
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp2 32.06 ± 0.06 32.00 ± 0.03 1.00
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp3 35.42 ± 0.11 35.34 ± 0.05 1.00
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp4 36.82 ± 0.02 42.05 ± 0.05 1.14
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp5 20.64 ± 0.03 46.80 ± 0.07 2.27
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp6 24.79 ± 0.04 45.84 ± 0.05 1.85
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp7 28.85 ± 0.05 42.90 ± 0.04 1.49
qwen2 32B Q4_K 18.48 GiB Metal,BLAS 1 pp8 32.92 ± 0.08 48.09 ± 0.13 1.46
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp1 20.66 ± 0.13 20.51 ± 0.02 0.99
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp2 24.03 ± 0.01 24.05 ± 0.02 1.00
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp3 25.49 ± 0.04 25.48 ± 0.02 1.00
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp4 26.06 ± 0.02 39.32 ± 0.12 1.51
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp5 18.89 ± 0.05 44.53 ± 0.04 2.36
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp6 22.64 ± 0.05 42.35 ± 0.02 1.87
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp7 26.33 ± 0.02 40.34 ± 0.12 1.53
qwen2 32B Q5_K 21.66 GiB Metal,BLAS 1 pp8 30.10 ± 0.04 45.64 ± 0.10 1.52
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp1 20.55 ± 0.03 20.54 ± 0.04 1.00
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp2 23.18 ± 0.01 23.16 ± 0.02 1.00
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp3 24.45 ± 0.03 24.41 ± 0.01 1.00
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp4 24.86 ± 0.03 40.36 ± 0.04 1.62
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp5 19.61 ± 0.03 44.46 ± 0.03 2.27
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp6 23.52 ± 0.05 43.93 ± 0.02 1.87
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp7 27.37 ± 0.02 42.27 ± 0.03 1.54
qwen2 32B Q6_K 25.03 GiB Metal,BLAS 1 pp8 31.28 ± 0.03 46.69 ± 0.05 1.49
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp1 22.45 ± 0.06 22.56 ± 0.07 1.00
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp2 28.20 ± 0.02 31.86 ± 0.08 1.13
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp3 30.84 ± 0.01 37.83 ± 0.12 1.23
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp4 32.05 ± 0.04 41.81 ± 0.05 1.30
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp5 22.18 ± 0.04 49.22 ± 0.07 2.22
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp6 26.55 ± 0.04 44.49 ± 0.16 1.68
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp7 30.90 ± 0.02 37.65 ± 0.12 1.22
qwen2 32B IQ4_NL 17.53 GiB Metal,BLAS 1 pp8 35.24 ± 0.03 43.38 ± 0.30 1.23
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp1 17.84 ± 0.08 17.85 ± 0.05 1.00
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp2 21.15 ± 0.02 32.86 ± 0.02 1.55
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp3 22.40 ± 0.01 45.93 ± 0.09 2.05
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp4 22.84 ± 0.02 55.71 ± 0.07 2.44
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp5 23.98 ± 0.02 57.98 ± 0.07 2.42
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp6 28.76 ± 0.04 53.49 ± 0.04 1.86
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp7 33.45 ± 0.02 57.51 ± 0.12 1.72
qwen2 32B Q8_0 32.42 GiB Metal,BLAS 1 pp8 38.24 ± 0.06 65.18 ± 0.10 1.70
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp1 9.76 ± 0.06 9.73 ± 0.04 1.00
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp2 11.14 ± 0.01 20.23 ± 0.04 1.82
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp3 11.75 ± 0.01 29.31 ± 0.05 2.49
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp4 14.26 ± 0.02 37.59 ± 0.32 2.64
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp5 26.38 ± 0.11 45.37 ± 0.31 1.72
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp6 31.58 ± 0.11 34.41 ± 0.18 1.09
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp7 36.69 ± 0.14 39.02 ± 0.07 1.06
qwen2 32B F16 61.03 GiB Metal,BLAS 1 pp8 41.82 ± 0.10 44.55 ± 0.10 1.07

build: 991f8aa (4239)

M1 Pro
model size backend fa test t/s t/s speedup
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp1 71.88 ± 0.91 72.07 ± 0.73 1.00
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp1 72.24 ± 0.12 71.35 ± 2.35 0.99
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp2 94.02 ± 0.08 89.35 ± 0.26 0.95
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp3 103.71 ± 0.12 125.13 ± 0.14 1.21
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp4 110.12 ± 0.06 140.56 ± 0.14 1.28
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp5 83.42 ± 0.07 155.18 ± 0.18 1.86
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp6 99.85 ± 0.04 151.34 ± 0.11 1.52
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp7 116.20 ± 0.04 145.15 ± 0.09 1.25
llama 3B Q4_0 1.78 GiB Metal,BLAS 1 pp8 132.87 ± 0.07 164.57 ± 0.07 1.24
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp1 61.36 ± 0.10 61.34 ± 0.16 1.00
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp1 61.41 ± 0.10 61.43 ± 0.19 1.00
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp2 73.71 ± 0.92 74.27 ± 0.12 1.01
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp3 79.77 ± 0.11 79.91 ± 0.05 1.00
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp4 83.43 ± 0.04 105.77 ± 0.12 1.27
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp5 71.31 ± 0.08 110.56 ± 0.11 1.55
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp6 85.53 ± 0.03 107.73 ± 0.09 1.26
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp7 99.54 ± 0.08 110.39 ± 0.10 1.11
llama 3B Q4_K 1.87 GiB Metal,BLAS 1 pp8 113.82 ± 0.07 122.02 ± 0.14 1.07
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp1 46.28 ± 0.13 46.03 ± 0.32 0.99
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp1 46.33 ± 0.08 45.26 ± 0.78 0.98
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp2 60.96 ± 0.08 76.28 ± 0.10 1.25
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp3 67.21 ± 0.03 114.75 ± 0.31 1.71
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp4 71.15 ± 0.07 135.15 ± 0.34 1.90
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp5 82.02 ± 0.02 149.47 ± 0.15 1.82
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp6 98.18 ± 0.07 137.05 ± 0.07 1.40
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp7 114.26 ± 0.03 139.25 ± 0.14 1.22
llama 3B Q8_0 3.18 GiB Metal,BLAS 1 pp8 130.61 ± 0.05 153.15 ± 0.99 1.17
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp1 40.60 ± 0.05 39.92 ± 0.22 0.98
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp1 40.50 ± 0.26 40.00 ± 0.45 0.99
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp2 47.66 ± 0.56 44.10 ± 0.13 0.93
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp3 49.87 ± 0.41 66.90 ± 0.30 1.34
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp4 51.91 ± 0.16 71.58 ± 0.26 1.38
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp5 41.91 ± 0.07 77.09 ± 0.26 1.84
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp6 50.22 ± 0.15 73.08 ± 0.19 1.46
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp7 58.22 ± 0.38 68.96 ± 0.14 1.18
llama 7B Q4_0 3.56 GiB Metal,BLAS 1 pp8 66.55 ± 0.23 77.88 ± 0.17 1.17
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp1 33.15 ± 0.18 33.16 ± 0.19 1.00
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp1 33.14 ± 0.19 33.18 ± 0.19 1.00
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp2 36.75 ± 0.13 36.75 ± 0.12 1.00
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp3 38.25 ± 0.12 38.26 ± 0.11 1.00
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp4 39.18 ± 0.09 52.31 ± 0.15 1.34
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp5 36.56 ± 0.01 54.24 ± 0.12 1.48
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp6 43.78 ± 0.01 52.56 ± 0.04 1.20
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp7 50.95 ± 0.09 52.13 ± 0.08 1.02
llama 7B Q4_K 3.80 GiB Metal,BLAS 1 pp8 58.25 ± 0.01 58.03 ± 0.12 1.00
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp1 23.74 ± 0.11 23.75 ± 0.10 1.00
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp1 23.77 ± 0.10 23.76 ± 0.10 1.00
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp2 26.14 ± 0.02 37.02 ± 0.12 1.42
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp3 27.29 ± 0.04 58.21 ± 0.22 2.13
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp4 27.92 ± 0.03 66.85 ± 0.20 2.39
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp5 41.06 ± 0.03 72.93 ± 0.21 1.78
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp6 49.16 ± 0.18 63.97 ± 0.19 1.30
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp7 56.99 ± 0.15 64.78 ± 0.19 1.14
llama 7B Q8_0 6.67 GiB Metal,BLAS 1 pp8 65.09 ± 0.17 71.21 ± 0.15 1.09
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp1 21.82 ± 0.03 21.83 ± 0.04 1.00
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp1 21.83 ± 0.02 21.85 ± 0.01 1.00
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp2 24.47 ± 0.02 22.65 ± 0.01 0.93
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp3 25.54 ± 0.01 35.48 ± 0.02 1.39
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp4 25.96 ± 0.04 37.49 ± 0.04 1.44
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp5 22.53 ± 0.01 41.14 ± 0.05 1.83
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp6 27.02 ± 0.01 37.65 ± 0.05 1.39
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp7 31.50 ± 0.01 35.59 ± 0.01 1.13
llama 13B Q4_0 6.86 GiB Metal,BLAS 1 pp8 35.94 ± 0.03 40.10 ± 0.03 1.12
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp1 17.75 ± 0.02 17.68 ± 0.03 1.00
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp1 17.76 ± 0.02 17.69 ± 0.05 1.00
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp2 19.17 ± 0.02 19.11 ± 0.02 1.00
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp3 19.76 ± 0.03 19.72 ± 0.02 1.00
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp4 20.01 ± 0.01 27.45 ± 0.02 1.37
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp5 19.28 ± 0.01 28.70 ± 0.03 1.49
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp6 23.10 ± 0.02 27.31 ± 0.00 1.18
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp7 26.91 ± 0.01 26.84 ± 0.01 1.00
llama 13B Q4_K 7.33 GiB Metal,BLAS 1 pp8 30.72 ± 0.01 29.84 ± 0.02 0.97
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp1 12.49 ± 0.03 12.48 ± 0.02 1.00
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp1 12.48 ± 0.04 12.49 ± 0.02 1.00
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp2 13.58 ± 0.02 19.03 ± 0.02 1.40
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp3 14.02 ± 0.01 31.95 ± 0.04 2.28
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp4 14.21 ± 0.01 34.42 ± 0.08 2.42
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp5 22.25 ± 0.02 37.49 ± 0.02 1.68
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp6 26.66 ± 0.02 33.12 ± 0.04 1.24
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp7 31.07 ± 0.02 33.16 ± 0.03 1.07
llama 13B Q8_0 12.88 GiB Metal,BLAS 1 pp8 35.45 ± 0.03 36.25 ± 0.04 1.02

build: 64ed209 (4240)

@github-actions github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Nov 29, 2024
@ggerganov ggerganov force-pushed the gg/metal-mul-mv-new-save4 branch from 8074ca8 to f45c40e Compare December 2, 2024 09:00
@ggerganov ggerganov marked this pull request as ready for review December 2, 2024 18:31
@ggerganov ggerganov merged commit 0115df2 into master Dec 3, 2024
48 checks passed
@ggerganov ggerganov deleted the gg/metal-mul-mv-new-save4 branch December 3, 2024 09:52
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Dec 7, 2024
* metal : small-batch mat-mul kernels

ggml-ci

* metal : add rest of types

ggml-ci

* metal : final adjustments

ggml-ci

* metal : add comments

ggml-ci
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
* metal : small-batch mat-mul kernels

ggml-ci

* metal : add rest of types

ggml-ci

* metal : final adjustments

ggml-ci

* metal : add comments

ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant