QuaRot#
QuaRot is proposed to harmonize the outliers within the activations before MatMul/Gemm. The main idea for QuaRot is to insert Hadamard transformation pairs into activations, hence projecting activations to the Hadamard domain. This projection can make discrete energy concentrated, or make concentrated energy discrete. Due to the discrete distribution of activation, the distribution after the Hadamard transform becomes more concentrated, thereby mitigating the outlier situation and relieving activation quantization error. Experiments show that using the QuaRot technique can improve the PTQ accuracy of LLMs like Llama-2, especially for models with a large number of outliers in the activation. Here is a sample showing how to enable QuaRot using quark.onnx:
from onnxruntime.quantization.calibrate import CalibrationMethod
from quark.onnx import ModelQuantizer, QuantType
from quark.onnx.quantization.config.config import Config, QuantizationConfig
quant_config = QuantizationConfig(
quant_format=QuantFormat.QDQ,
calibrate_method=CalibrationMethod.MinMax,
activation_type=QuantType.QUInt8,
weight_type=QuantType.QInt8,
enable_npu_transformer=True,
include_rotation=True,
extra_options={
'RMatrixDim': 4096,
'UseRandomHad': False,
'RConfigPath': "rotation_config.json",
'ActivationSymmetric': True,
'CalibMovingAverage': True
},
)
config = Config(global_quant_config=quant_config)
quantizer = ModelQuantizer(config)
quantizer.quantize_model(input_model_path, output_model_path, calibration_data_reader=None)
Arguments#
include_rotation: (Boolean) This parameter is a flag that determines whether to optimize the models using QuaRot. It can improve the accuracy of LLMs like Llama. RConfigPath must be given if include_rotation is True. The default is False.
extra_options: (Dictionary or None) Contains key-value pairs for various options in different cases. Options related to SQ are:
RMatrixDim: (Int) Specifies the dimension for constructing rotation matrix. The default value is 4096.
UseRandomHad: (Boolean) If True, the rotation matrix is generated by the random Hadamard scheme. The default is False.
RConfigPath: (String) Sets the path for the rotation config file. This is necessary when using QuaRot. The default is “”.
ActivationSymmetric: (Boolean) If True, symmetrizes calibration data for activations. The default is False.
CalibMovingAverage: (Boolean) If True, the moving average of the minimum and maximum values is computed when the calibration method selected is MinMax. The default is False. In PowerOfTwoMethod calibration method, this should be set to False.
Example#
Note
For information on accessing AMD Quark ONNX examples, refer to Accessing ONNX Examples.
This example and the relevant files are available at /onnx/accuracy_improvement/quarot
This example demonstrates quantizing a Llama-2-7b-hf model using the AMD Quark ONNX quantizer. It also shows how to use the QuaRot algorithm.