Performance of Quantized Models¶

Attention

To be updated for Qwen3.

This section reports the generation performance of quantized models (including GPTQ and AWQ) of the Qwen2 series. Specifically, we report:

We use greedy decoding in evaluating all models.

	Quantization	Average	MMLU	C-Eval	IFEval
Qwen2-72B-Instruct	BF16	81.3	82.3	83.8	77.6
	GPTQ-Int8	80.7	81.3	83.4	77.5
	GPTQ-Int4	81.2	80.8	83.9	78.9
	AWQ	80.4	80.5	83.9	76.9
Qwen2-7B-Instruct	BF16	66.9	70.5	77.2	53.1
	GPTQ-Int8	66.2	69.1	76.7	52.9
	GPTQ-Int4	64.1	67.8	75.2	49.4
	AWQ	64.1	67.4	73.6	51.4
Qwen2-1.5B-Instruct	BF16	48.4	52.4	63.8	29.0
	GPTQ-Int8	48.1	53.0	62.5	28.8
	GPTQ-Int4	45.0	50.7	57.4	27.0
	AWQ	46.5	51.6	58.1	29.9
Qwen2-0.5B-Instruct	BF16	34.4	37.9	45.2	20.0
	GPTQ-Int8	32.6	35.6	43.9	18.1
	GPTQ-Int4	29.7	33.0	39.2	16.8
	AWQ	31.1	34.4	42.1	16.7