Performance of Quantized ModelsΒΆ
Attention
To be updated for Qwen3.
This section reports the generation performance of quantized models (including GPTQ and AWQ) of the Qwen2 series. Specifically, we report:
MMLU (Accuracy)
C-Eval (Accuracy)
IFEval (Strict Prompt-Level Accuracy)
We use greedy decoding in evaluating all models.
Quantization |
Average |
MMLU |
C-Eval |
IFEval |
|
|---|---|---|---|---|---|
Qwen2-72B-Instruct |
BF16 |
81.3 |
82.3 |
83.8 |
77.6 |
GPTQ-Int8 |
80.7 |
81.3 |
83.4 |
77.5 |
|
GPTQ-Int4 |
81.2 |
80.8 |
83.9 |
78.9 |
|
AWQ |
80.4 |
80.5 |
83.9 |
76.9 |
|
Qwen2-7B-Instruct |
BF16 |
66.9 |
70.5 |
77.2 |
53.1 |
GPTQ-Int8 |
66.2 |
69.1 |
76.7 |
52.9 |
|
GPTQ-Int4 |
64.1 |
67.8 |
75.2 |
49.4 |
|
AWQ |
64.1 |
67.4 |
73.6 |
51.4 |
|
Qwen2-1.5B-Instruct |
BF16 |
48.4 |
52.4 |
63.8 |
29.0 |
GPTQ-Int8 |
48.1 |
53.0 |
62.5 |
28.8 |
|
GPTQ-Int4 |
45.0 |
50.7 |
57.4 |
27.0 |
|
AWQ |
46.5 |
51.6 |
58.1 |
29.9 |
|
Qwen2-0.5B-Instruct |
BF16 |
34.4 |
37.9 |
45.2 |
20.0 |
GPTQ-Int8 |
32.6 |
35.6 |
43.9 |
18.1 |
|
GPTQ-Int4 |
29.7 |
33.0 |
39.2 |
16.8 |
|
AWQ |
31.1 |
34.4 |
42.1 |
16.7 |