Performance of Quantized ModelsΒΆ

Attention

To be updated for Qwen3.

This section reports the generation performance of quantized models (including GPTQ and AWQ) of the Qwen2 series. Specifically, we report:

  • MMLU (Accuracy)

  • C-Eval (Accuracy)

  • IFEval (Strict Prompt-Level Accuracy)

We use greedy decoding in evaluating all models.

Quantization

Average

MMLU

C-Eval

IFEval

Qwen2-72B-Instruct

BF16

81.3

82.3

83.8

77.6

GPTQ-Int8

80.7

81.3

83.4

77.5

GPTQ-Int4

81.2

80.8

83.9

78.9

AWQ

80.4

80.5

83.9

76.9

Qwen2-7B-Instruct

BF16

66.9

70.5

77.2

53.1

GPTQ-Int8

66.2

69.1

76.7

52.9

GPTQ-Int4

64.1

67.8

75.2

49.4

AWQ

64.1

67.4

73.6

51.4

Qwen2-1.5B-Instruct

BF16

48.4

52.4

63.8

29.0

GPTQ-Int8

48.1

53.0

62.5

28.8

GPTQ-Int4

45.0

50.7

57.4

27.0

AWQ

46.5

51.6

58.1

29.9

Qwen2-0.5B-Instruct

BF16

34.4

37.9

45.2

20.0

GPTQ-Int8

32.6

35.6

43.9

18.1

GPTQ-Int4

29.7

33.0

39.2

16.8

AWQ

31.1

34.4

42.1

16.7