ONNX model calibration

ONNX model calibration#

quark.onnx.calibrate.GenerateAnEmptyOnnxModel() → ModelProto[source]#: Generate a empty onnx model in a temporary directory and return the path.

class quark.onnx.calibrate.OverridedHistogramCollector(method: str, symmetric: bool, num_bins: int, num_quantized_bins: int, percentile: float, scenario: str = 'same')[source]#

collect(name_to_arr: Dict[Any, Any]) → Any[source]#

Generate informative data based on given data.

name_to_arrdict: tensor name to NDArray data

class quark.onnx.calibrate.OverridedMinMaxCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', symmetric: bool = False, use_external_data_format: bool = False, moving_average: bool = False, averaging_constant: float = 0.01, max_intermediate_outputs: int | None = None)[source]#

This class is used to override the original Calibrater to prevent saving the augmented model to disk if the model size is less than 2GB.

Parameters:

model_input – ONNX model to calibrate. It is a model path or a ModelProto.
op_types_to_calibrate – operator types to calibrate. By default, calibrate all the float32/float16 tensors.
augmented_model_path – save augmented model to this path.
symmetric – make range of tensor symmetric (central point is 0).
use_external_data_format – use external data format to store model which size is >= 2Gb
moving_average – compute the moving average of the minimum and maximum values instead of the global minimum and maximum.
averaging_constant – constant smoothing factor to use when computing the moving average.
max_intermediate_outputs – maximum number of intermediate outputs before an intermediate range is computed.

augment_graph() → None[source]#

Adds ReduceMin and ReduceMax nodes to all quantization_candidates op type nodes in model and ensures their outputs are stored as part of the graph output

Returns:: augmented ONNX model

create_inference_session() → None[source]#: create an OnnxRuntime InferenceSession.

class quark.onnx.calibrate.OverridedHistogramCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'percentile', symmetric: bool = False, num_bins: int = 128, num_quantized_bins: int = 2048, percentile: float = 99.999, scenario: str = 'same')[source]#

This class is used to override the original Calibrater to prevent saving the augmented model to disk if the model size is less than 2GB

augment_graph() → None[source]#

make all quantization_candidates op type nodes as part of the graph output.

Returns:: augmented ONNX model

create_inference_session() → None[source]#: create an OnnxRuntime InferenceSession.

compute_data() → TensorsData[source]#

Compute the min-max range of tensor

Returns:: dictionary mapping: {tensor name: (min value, max value)}

class quark.onnx.calibrate.MinMaxCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', symmetric: bool = False, use_external_data_format: bool = False, moving_average: bool = False, averaging_constant: float = 0.01)[source]#

This method obtains the quantization parameters based on the minimum and maximum values of each tensor.

Parameters:

model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None.
augmented_model_path (str) – Path to save the augmented model. Default is "augmented_model.onnx".
symmetric (bool) – Whether to make the range of tensor symmetric (central point is 0). Default is False.
use_external_data_format (bool) – Whether to use external data format to store model which size is >= 2GB. Default is False.
moving_average (bool) – Whether to compute the moving average of the minimum and maximum values instead of the global minimum and maximum. Default is False.
averaging_constant (float) – Constant smoothing factor to use when computing the moving average. Default is 0.01. Should be between 0 and 1.

Raises:

ValueError – If averaging_constant is not between 0 and 1 when moving_average is True.

class quark.onnx.calibrate.EntropyCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'entropy', symmetric: bool = False, num_bins: int = 128, num_quantized_bins: int = 128)[source]#

This method determines the quantization parameters by considering the entropy algorithm of each tensor’s distribution.

Parameters:

model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None, which indicates that all float32/float16 tensors are calibrated.
augmented_model_path (str) – Path to save the augmented model. Default is "augmented_model.onnx".
use_external_data_format (bool) – Whether to use external data format to store model which size is >= 2GB. Default is False.
method (str) – Method for calibration. One of [‘entropy’, ‘percentile’, ‘distribution’]. Default is "entropy".
symmetric (bool) – Whether to make the range of tensor symmetric (central point is 0). Default is False.
num_bins (int) – Number of bins to create a new histogram for collecting tensor values. Default is 128.
num_quantized_bins (int) – Number of quantized bins. Default is 128.

class quark.onnx.calibrate.PercentileCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'percentile', symmetric: bool = False, num_bins: int = 2048, percentile: float = 99.999)[source]#

This method calculates quantization parameters using percentiles of the tensor values.

Parameters:

model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None, which indicates that all float32/float16 tensors are calibrated.
augmented_model_path (str) – Path to save the augmented model. Default is "augmented_model.onnx".
use_external_data_format (bool) – Whether to use external data format to store model which size is >= 2GB. Default is False.
method (str) – Method for calibration. One of "entropy", "percentile" or "distribution". Default is "percentile".
symmetric (bool) – Whether to make the range of tensor symmetric (central point is 0). Default is False.
num_bins (int) – Number of bins to create a new histogram for collecting tensor values. Default is 2048.
percentile (float) – Percentile value for calibration, a float between [0, 100]. Default is 99.999.

collect_data(data_reader: CalibrationDataReader) → None[source]#: Entropy Calibrator collects operators’ tensors as well as generates tensor histogram for each operator.

class quark.onnx.calibrate.DistributionCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'distribution', num_bins: int = 128, scenario: str = 'same')[source]#

This method calculates quantization parameters according to distribution of the tensor values.

Parameters:

model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None, which indicates that all float32/float16 tensors are calibrated.
augmented_model_path – save augmented model to this path. Defaults to "augmented_model.onnx".
use_external_data_format – use external data format to store model which size is >= 2Gb. Defaults to False.
method (str) – One of [‘entropy’, ‘percentile’, ‘distribution’]. Defaults to "distribution".
num_bins (int) – number of bins to create a new histogram for collecting tensor values. Defaults to 128.
scenario (str) – for float 8 only, if scenario="same", the algorithm weights and float 8 follow the same distribution, if scenario="p3", it assumes the weights follow a gaussian law and float 8 ~ X^3 where X is a gaussian law. Defaults to "same".

class quark.onnx.calibrate.PowOfTwoCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, activation_type: QuantType | ExtendedQuantType = QuantType.QInt8, method: PowerOfTwoMethod = PowerOfTwoMethod.MinMSE, symmetric: bool = True, minmse_mode: str = 'All', percentile: float = 99.999, quantized_tensor_type: Dict[Any, Any] = {})[source]#

This method get the power-of-two quantize parameters for each tensor to minimize the mean-square-loss of quantized values and float values. This takes longer time but usually gets better accuracy.

Parameters:

model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None, which indicates that all float32/float16 tensors are calibrated.
augmented_model_path – Path to save the augmented model. Default is "augmented_model.onnx".
use_external_data_format (bool) – Whether to use external data format to store model which size is >= 2GB. Default is False.
activation_type (Union[QuantType, ExtendedQuantType]) – Type of quantization for activations. Default is QuantType.QInt8.
method (PowerOfTwoMethod) – Calibration method. Default is PowerOfTwoMethod.MinMSE.
symmetric (bool) – Whether to make the range of tensor symmetric (central point is 0). Default is True.
minmse_mode (str) – Mode for the MinMSE method. Default is "All".
percentile (float) – Percentile value for calibration, a float between 0 and 100. Default is 99.999.
quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying the quantized tensor type. Default is {}.

augment_graph() → None[source]#

make all quantization_candidates op type nodes as part of the graph output.

Returns:: augmented ONNX model

collect_data(data_reader: CalibrationDataReader) → None[source]#: abstract method: collect the tensors that will be used for range computation. It can be called multiple times.

compute_range() → Any[source]#

Compute the min-max range of tensor

Returns:: dictionary mapping: {tensor name: (min value, max value)}

create_inference_session() → None[source]#: create an OnnxRuntime InferenceSession.

class quark.onnx.calibrate.PowOfTwoCollector(activation_type: QuantType | ExtendedQuantType = QuantType.QInt8, method: PowerOfTwoMethod = PowerOfTwoMethod.MinMSE, symmetric: bool = True, minmse_mode: str = 'All', percentile: float = 99.999, quantized_tensor_type: Dict[Any, Any] = {})[source]#

Collecting PowOfTwoCollector quantize for each tensor. Support MinMSE method.

Parameters:

activation_type – Type of quantization for activations. Default is QuantType.QInt8.
method – Calibration method. Default is PowerOfTwoMethod.MinMSE.
symmetric – Whether to make the range of tensor symmetric (central point is 0). Default is True.
minmse_mode – Mode for the MinMSE method. Default is “All”.
percentile – Percentile value for calibration, a float between 0 and 100. Default is 99.999.
quantized_tensor_type – Dictionary specifying the quantized tensor type. Default is an empty dictionary.

collect(name_to_arr: Dict[Any, Any]) → None[source]#

Generate informative data based on given data.

name_to_arrdict: tensor name to NDArray data

compute_collection_result() → Any[source]#: Get the optimal result among collection data.

class quark.onnx.calibrate.LayerWiseMethod(*values)[source]#

class quark.onnx.calibrate.LayerWisePercentileCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'percentile', symmetric: bool = False, num_bins: int = 2048, percentile: float = 99.999, lwp_metric: str = 'mae', activation_bitwidth: int = 8, percentile_candidates: List[float] = [99.99, 99.999, 99.9999])[source]#

Parameters:

model_input – ONNX model to calibrate. It is a model path or a ModelProto.
op_types_to_calibrate – operator types to calibrate. By default, calibrate all the float32/float16 tensors.
augmented_model_path – save augmented model to this path.
use_external_data_format – use external data format to store model which size is >= 2Gb
method – A string. One of [‘entropy’, ‘percentile’, ‘distribution’].
symmetric – make range of tensor symmetric (central point is 0).
num_quantized_bins – number of quantized bins. Default 128.
percentile – A float number between [0, 100]. Default 99.99.
lwp_mtric (str) – A str value which is use to judge the percentile’s metric. One of [‘mae’, ‘mse’]. Defaults to "mae".
activation_bitwidth (int) – Bitwidth for activations. Defaults to 8.
percentile_candidates (List[float]) – Percentile candidates. Defaults to [99.99, 99.999, 99.9999].

collect_data(data_reader: CalibrationDataReader) → None[source]#: Entropy Calibrator collects operators’ tensors as well as generates tensor histogram for each operator.

compute_data() → TensorsData[source]#

Compute the min-max range of tensor

Returns:: dictionary mapping: {tensor name: (min value, max value)}

quark.onnx.calibrate.create_calibrator_power_of_two(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', activation_type: ExtendedQuantType | QuantType = QuantType.QInt8, method: PowerOfTwoMethod = PowerOfTwoMethod.NonOverflow, use_external_data_format: bool = False, execution_providers: List[str] | None = ['CPUExecutionProvider'], quantized_tensor_type: Dict[Any, Any] = {}, extra_options: Dict[str, Any] = {}) → Any[source]#

Create a calibrator for power-of-two quantization.

Parameters:

model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None, which indicates that all float32/float16 tensors are calibrated.
augmented_model_path – Path to save the augmented ONNX model.
activation_type – Type of quantization for activations.
method – Calibration method to use.
use_external_data_format – Whether to use external data format for large models.
execution_providers – List of execution providers for ONNX Runtime.
quantized_tensor_type – Dictionary specifying the quantized tensor type.
extra_options – Additional options for calibrator configuration.

Returns:

Initialized calibrator object.

quark.onnx.calibrate.create_calibrator_float_scale(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', calibrate_method: CalibrationMethod | LayerWisePercentileCalibrater = CalibrationMethod.MinMax, use_external_data_format: bool = False, execution_providers: List[str] | None = ['CPUExecutionProvider'], extra_options: Dict[str, Any] = {}) → Any[source]#

Create a calibrator for floating-point scale quantization.

Parameters:

model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None, which indicates that all float32/float16 tensors are calibrated.
augmented_model_path – Path to save the augmented ONNX model.
calibrate_method – Calibration method to use (MinMax, Entropy, Percentile, or Distribution).
use_external_data_format – Whether to use external data format for large models.
execution_providers – List of execution providers for ONNX Runtime.
extra_options – Additional options for calibrator configuration.

Returns:

Initialized calibrator object.

ONNX model calibration

Contents

ONNX model calibration#