Getting Started with Quark for ONNX#
Here is an example of running quantization with
U8S8_AAWS_CONFIG
configurations. We also support
quantization without real calibration data for rapid validation of
deployment or performance benchmarking. Detailed explanations for each
step will be provided on other chapter of the User Guide.
import onnxruntime
from onnxruntime.quantization.calibrate import CalibrationDataReader
from quark.onnx.quantization.config import (Config, get_default_config)
from quark.onnx import ModelQuantizer
# 1. Set Model
# The input_model_path is the path to the floating point model to be quantized. The output_model_path is the path where the quantized model will be saved.
input_model_path = "/path/to/input/model"
output_model_path = "/path/to/output/model"
# 2. Set Calibration Dataset
# `dr` (Data Reader) is an instance of CalibrationDataReader. When dr is None, the quantizer will use random data for calibration. Please refer to user guide for how to set up the CalibrationDataReader.
dr = None
# 3. Set quantization configuration
quant_config = get_default_config("U8S8_AAWS")
config = Config(global_quant_config=quant_config)
quantizer = ModelQuantizer(config)
# 5. Quantize the ONNX model
quantizer.quantize_model(input_model_path, output_model_path, dr)