ONNX model calibration#
- quark.onnx.calibrate.GenerateAnEmptyOnnxModel() ModelProto [source]#
Generate a empty onnx model in a temporary directory and return the path.
- class quark.onnx.calibrate.OverridedHistogramCollector(method: str, symmetric: bool, num_bins: int, num_quantized_bins: int, percentile: float, scenario: str = 'same')[source]#
- class quark.onnx.calibrate.OverridedMinMaxCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', symmetric: bool = False, use_external_data_format: bool = False, moving_average: bool = False, averaging_constant: float = 0.01, max_intermediate_outputs: int | None = None)[source]#
This class is used to override the original Calibrater to prevent saving the augmented model to disk if the model size is less than 2GB.
- Parameters:
model_input – ONNX model to calibrate. It is a model path or a ModelProto.
op_types_to_calibrate – operator types to calibrate. By default, calibrate all the float32/float16 tensors.
augmented_model_path – save augmented model to this path.
symmetric – make range of tensor symmetric (central point is 0).
use_external_data_format – use external data format to store model which size is >= 2Gb
moving_average – compute the moving average of the minimum and maximum values instead of the global minimum and maximum.
averaging_constant – constant smoothing factor to use when computing the moving average.
max_intermediate_outputs – maximum number of intermediate outputs before an intermediate range is computed.
- class quark.onnx.calibrate.OverridedHistogramCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'percentile', symmetric: bool = False, num_bins: int = 128, num_quantized_bins: int = 2048, percentile: float = 99.999, scenario: str = 'same')[source]#
This class is used to override the original Calibrater to prevent saving the augmented model to disk if the model size is less than 2GB
- class quark.onnx.calibrate.MinMaxCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', symmetric: bool = False, use_external_data_format: bool = False, moving_average: bool = False, averaging_constant: float = 0.01)[source]#
This method obtains the quantization parameters based on the minimum and maximum values of each tensor.
- Parameters:
model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to
None
.augmented_model_path (str) – Path to save the augmented model. Default is
"augmented_model.onnx"
.symmetric (bool) – Whether to make the range of tensor symmetric (central point is 0). Default is
False
.use_external_data_format (bool) – Whether to use external data format to store model which size is >= 2GB. Default is
False
.moving_average (bool) – Whether to compute the moving average of the minimum and maximum values instead of the global minimum and maximum. Default is
False
.averaging_constant (float) – Constant smoothing factor to use when computing the moving average. Default is
0.01
. Should be between 0 and 1.
- Raises:
ValueError – If averaging_constant is not between 0 and 1 when moving_average is True.
- class quark.onnx.calibrate.EntropyCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'entropy', symmetric: bool = False, num_bins: int = 128, num_quantized_bins: int = 128)[source]#
This method determines the quantization parameters by considering the entropy algorithm of each tensor’s distribution.
- Parameters:
model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to
None
, which indicates that all float32/float16 tensors are calibrated.augmented_model_path (str) – Path to save the augmented model. Default is
"augmented_model.onnx"
.use_external_data_format (bool) – Whether to use external data format to store model which size is >= 2GB. Default is
False
.method (str) – Method for calibration. One of [‘entropy’, ‘percentile’, ‘distribution’]. Default is
"entropy"
.symmetric (bool) – Whether to make the range of tensor symmetric (central point is 0). Default is
False
.num_bins (int) – Number of bins to create a new histogram for collecting tensor values. Default is
128
.num_quantized_bins (int) – Number of quantized bins. Default is
128
.
- class quark.onnx.calibrate.PercentileCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'percentile', symmetric: bool = False, num_bins: int = 2048, percentile: float = 99.999)[source]#
This method calculates quantization parameters using percentiles of the tensor values.
- Parameters:
model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to
None
, which indicates that all float32/float16 tensors are calibrated.augmented_model_path (str) – Path to save the augmented model. Default is
"augmented_model.onnx"
.use_external_data_format (bool) – Whether to use external data format to store model which size is >= 2GB. Default is
False
.method (str) – Method for calibration. One of
"entropy"
,"percentile"
or"distribution"
. Default is"percentile"
.symmetric (bool) – Whether to make the range of tensor symmetric (central point is 0). Default is
False
.num_bins (int) – Number of bins to create a new histogram for collecting tensor values. Default is
2048
.percentile (float) – Percentile value for calibration, a float between [0, 100]. Default is
99.999
.
- class quark.onnx.calibrate.DistributionCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'distribution', num_bins: int = 128, scenario: str = 'same')[source]#
This method calculates quantization parameters according to distribution of the tensor values.
- Parameters:
model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to
None
, which indicates that all float32/float16 tensors are calibrated.augmented_model_path – save augmented model to this path. Defaults to
"augmented_model.onnx"
.use_external_data_format – use external data format to store model which size is >= 2Gb. Defaults to
False
.method (str) – One of [‘entropy’, ‘percentile’, ‘distribution’]. Defaults to
"distribution"
.num_bins (int) – number of bins to create a new histogram for collecting tensor values. Defaults to
128
.scenario (str) – for float 8 only, if
scenario="same"
, the algorithm weights and float 8 follow the same distribution, ifscenario="p3"
, it assumes the weights follow a gaussian law and float 8 ~ X^3 where X is a gaussian law. Defaults to"same"
.
- class quark.onnx.calibrate.PowOfTwoCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, activation_type: QuantType | ExtendedQuantType = QuantType.QInt8, method: PowerOfTwoMethod = PowerOfTwoMethod.MinMSE, symmetric: bool = True, minmse_mode: str = 'All', percentile: float = 99.999, quantized_tensor_type: Dict[Any, Any] = {})[source]#
This method get the power-of-two quantize parameters for each tensor to minimize the mean-square-loss of quantized values and float values. This takes longer time but usually gets better accuracy.
- Parameters:
model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to
None
, which indicates that all float32/float16 tensors are calibrated.augmented_model_path – Path to save the augmented model. Default is
"augmented_model.onnx"
.use_external_data_format (bool) – Whether to use external data format to store model which size is >= 2GB. Default is
False
.activation_type (Union[QuantType, ExtendedQuantType]) – Type of quantization for activations. Default is
QuantType.QInt8
.method (PowerOfTwoMethod) – Calibration method. Default is
PowerOfTwoMethod.MinMSE
.symmetric (bool) – Whether to make the range of tensor symmetric (central point is 0). Default is
True
.minmse_mode (str) – Mode for the MinMSE method. Default is
"All"
.percentile (float) – Percentile value for calibration, a float between 0 and 100. Default is
99.999
.quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying the quantized tensor type. Default is
{}
.
- augment_graph() None [source]#
make all quantization_candidates op type nodes as part of the graph output.
- Returns:
augmented ONNX model
- collect_data(data_reader: CalibrationDataReader) None [source]#
abstract method: collect the tensors that will be used for range computation. It can be called multiple times.
- class quark.onnx.calibrate.PowOfTwoCollector(activation_type: QuantType | ExtendedQuantType = QuantType.QInt8, method: PowerOfTwoMethod = PowerOfTwoMethod.MinMSE, symmetric: bool = True, minmse_mode: str = 'All', percentile: float = 99.999, quantized_tensor_type: Dict[Any, Any] = {})[source]#
Collecting PowOfTwoCollector quantize for each tensor. Support MinMSE method.
- Parameters:
activation_type – Type of quantization for activations. Default is QuantType.QInt8.
method – Calibration method. Default is PowerOfTwoMethod.MinMSE.
symmetric – Whether to make the range of tensor symmetric (central point is 0). Default is True.
minmse_mode – Mode for the MinMSE method. Default is “All”.
percentile – Percentile value for calibration, a float between 0 and 100. Default is 99.999.
quantized_tensor_type – Dictionary specifying the quantized tensor type. Default is an empty dictionary.
- class quark.onnx.calibrate.LayerWisePercentileCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'percentile', symmetric: bool = False, num_bins: int = 2048, percentile: float = 99.999, lwp_metric: str = 'mae', activation_bitwidth: int = 8, percentile_candidates: List[float] = [99.99, 99.999, 99.9999])[source]#
- Parameters:
model_input – ONNX model to calibrate. It is a model path or a ModelProto.
op_types_to_calibrate – operator types to calibrate. By default, calibrate all the float32/float16 tensors.
augmented_model_path – save augmented model to this path.
use_external_data_format – use external data format to store model which size is >= 2Gb
method – A string. One of [‘entropy’, ‘percentile’, ‘distribution’].
symmetric – make range of tensor symmetric (central point is 0).
num_quantized_bins – number of quantized bins. Default 128.
percentile – A float number between [0, 100]. Default 99.99.
lwp_mtric (str) – A str value which is use to judge the percentile’s metric. One of [‘mae’, ‘mse’]. Defaults to
"mae"
.activation_bitwidth (int) – Bitwidth for activations. Defaults to
8
.percentile_candidates (List[float]) – Percentile candidates. Defaults to
[99.99, 99.999, 99.9999]
.
- quark.onnx.calibrate.create_calibrator_power_of_two(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', activation_type: ExtendedQuantType | QuantType = QuantType.QInt8, method: PowerOfTwoMethod = PowerOfTwoMethod.NonOverflow, use_external_data_format: bool = False, execution_providers: List[str] | None = ['CPUExecutionProvider'], quantized_tensor_type: Dict[Any, Any] = {}, extra_options: Dict[str, Any] = {}) Any [source]#
Create a calibrator for power-of-two quantization.
- Parameters:
model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to
None
, which indicates that all float32/float16 tensors are calibrated.augmented_model_path – Path to save the augmented ONNX model.
activation_type – Type of quantization for activations.
method – Calibration method to use.
use_external_data_format – Whether to use external data format for large models.
execution_providers – List of execution providers for ONNX Runtime.
quantized_tensor_type – Dictionary specifying the quantized tensor type.
extra_options – Additional options for calibrator configuration.
- Returns:
Initialized calibrator object.
- quark.onnx.calibrate.create_calibrator_float_scale(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', calibrate_method: CalibrationMethod | LayerWisePercentileCalibrater = CalibrationMethod.MinMax, use_external_data_format: bool = False, execution_providers: List[str] | None = ['CPUExecutionProvider'], extra_options: Dict[str, Any] = {}) Any [source]#
Create a calibrator for floating-point scale quantization.
- Parameters:
model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to
None
, which indicates that all float32/float16 tensors are calibrated.augmented_model_path – Path to save the augmented ONNX model.
calibrate_method – Calibration method to use (MinMax, Entropy, Percentile, or Distribution).
use_external_data_format – Whether to use external data format for large models.
execution_providers – List of execution providers for ONNX Runtime.
extra_options – Additional options for calibrator configuration.
- Returns:
Initialized calibrator object.