ONNX model calibration#

quark.onnx.calibrate.GenerateAnEmptyOnnxModel() ModelProto[source]#

Generate a empty onnx model in a temporary directory and return the path.

class quark.onnx.calibrate.OverridedHistogramCollector(method: str, symmetric: bool, num_bins: int, num_quantized_bins: int, percentile: float, scenario: str = 'same')[source]#
collect(name_to_arr: Dict[Any, Any]) Any[source]#
Generate informative data based on given data.
name_to_arrdict

tensor name to NDArray data

class quark.onnx.calibrate.OverridedMinMaxCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', symmetric: bool = False, use_external_data_format: bool = False, moving_average: bool = False, averaging_constant: float = 0.01, max_intermediate_outputs: int | None = None)[source]#

This class is used to override the original Calibrater to prevent saving the augmented model to disk if the model size is less than 2GB.

Parameters:
  • model_input – ONNX model to calibrate. It is a model path or a ModelProto.

  • op_types_to_calibrate – operator types to calibrate. By default, calibrate all the float32/float16 tensors.

  • augmented_model_path – save augmented model to this path.

  • symmetric – make range of tensor symmetric (central point is 0).

  • use_external_data_format – use external data format to store model which size is >= 2Gb

  • moving_average – compute the moving average of the minimum and maximum values instead of the global minimum and maximum.

  • averaging_constant – constant smoothing factor to use when computing the moving average.

  • max_intermediate_outputs – maximum number of intermediate outputs before an intermediate range is computed.

augment_graph() None[source]#

Adds ReduceMin and ReduceMax nodes to all quantization_candidates op type nodes in model and ensures their outputs are stored as part of the graph output

Returns:

augmented ONNX model

create_inference_session() None[source]#

create an OnnxRuntime InferenceSession.

class quark.onnx.calibrate.OverridedHistogramCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'percentile', symmetric: bool = False, num_bins: int = 128, num_quantized_bins: int = 2048, percentile: float = 99.999, scenario: str = 'same')[source]#

This class is used to override the original Calibrater to prevent saving the augmented model to disk if the model size is less than 2GB

augment_graph() None[source]#

make all quantization_candidates op type nodes as part of the graph output.

Returns:

augmented ONNX model

create_inference_session() None[source]#

create an OnnxRuntime InferenceSession.

compute_data() TensorsData[source]#

Compute the min-max range of tensor

Returns:

dictionary mapping: {tensor name: (min value, max value)}

class quark.onnx.calibrate.MinMaxCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', symmetric: bool = False, use_external_data_format: bool = False, moving_average: bool = False, averaging_constant: float = 0.01)[source]#

This method obtains the quantization parameters based on the minimum and maximum values of each tensor.

Parameters:
  • model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.

  • op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None.

  • augmented_model_path (str) – Path to save the augmented model. Default is "augmented_model.onnx".

  • symmetric (bool) – Whether to make the range of tensor symmetric (central point is 0). Default is False.

  • use_external_data_format (bool) – Whether to use external data format to store model which size is >= 2GB. Default is False.

  • moving_average (bool) – Whether to compute the moving average of the minimum and maximum values instead of the global minimum and maximum. Default is False.

  • averaging_constant (float) – Constant smoothing factor to use when computing the moving average. Default is 0.01. Should be between 0 and 1.

Raises:

ValueError – If averaging_constant is not between 0 and 1 when moving_average is True.

class quark.onnx.calibrate.EntropyCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'entropy', symmetric: bool = False, num_bins: int = 128, num_quantized_bins: int = 128)[source]#

This method determines the quantization parameters by considering the entropy algorithm of each tensor’s distribution.

Parameters:
  • model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.

  • op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None, which indicates that all float32/float16 tensors are calibrated.

  • augmented_model_path (str) – Path to save the augmented model. Default is "augmented_model.onnx".

  • use_external_data_format (bool) – Whether to use external data format to store model which size is >= 2GB. Default is False.

  • method (str) – Method for calibration. One of [‘entropy’, ‘percentile’, ‘distribution’]. Default is "entropy".

  • symmetric (bool) – Whether to make the range of tensor symmetric (central point is 0). Default is False.

  • num_bins (int) – Number of bins to create a new histogram for collecting tensor values. Default is 128.

  • num_quantized_bins (int) – Number of quantized bins. Default is 128.

class quark.onnx.calibrate.PercentileCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'percentile', symmetric: bool = False, num_bins: int = 2048, percentile: float = 99.999)[source]#

This method calculates quantization parameters using percentiles of the tensor values.

Parameters:
  • model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.

  • op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None, which indicates that all float32/float16 tensors are calibrated.

  • augmented_model_path (str) – Path to save the augmented model. Default is "augmented_model.onnx".

  • use_external_data_format (bool) – Whether to use external data format to store model which size is >= 2GB. Default is False.

  • method (str) – Method for calibration. One of "entropy", "percentile" or "distribution". Default is "percentile".

  • symmetric (bool) – Whether to make the range of tensor symmetric (central point is 0). Default is False.

  • num_bins (int) – Number of bins to create a new histogram for collecting tensor values. Default is 2048.

  • percentile (float) – Percentile value for calibration, a float between [0, 100]. Default is 99.999.

collect_data(data_reader: CalibrationDataReader) None[source]#

Entropy Calibrator collects operators’ tensors as well as generates tensor histogram for each operator.

class quark.onnx.calibrate.DistributionCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'distribution', num_bins: int = 128, scenario: str = 'same')[source]#

This method calculates quantization parameters according to distribution of the tensor values.

Parameters:
  • model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.

  • op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None, which indicates that all float32/float16 tensors are calibrated.

  • augmented_model_path – save augmented model to this path. Defaults to "augmented_model.onnx".

  • use_external_data_format – use external data format to store model which size is >= 2Gb. Defaults to False.

  • method (str) – One of [‘entropy’, ‘percentile’, ‘distribution’]. Defaults to "distribution".

  • num_bins (int) – number of bins to create a new histogram for collecting tensor values. Defaults to 128.

  • scenario (str) – for float 8 only, if scenario="same", the algorithm weights and float 8 follow the same distribution, if scenario="p3", it assumes the weights follow a gaussian law and float 8 ~ X^3 where X is a gaussian law. Defaults to "same".

class quark.onnx.calibrate.PowOfTwoCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, activation_type: QuantType | ExtendedQuantType = QuantType.QInt8, method: PowerOfTwoMethod = PowerOfTwoMethod.MinMSE, symmetric: bool = True, minmse_mode: str = 'All', percentile: float = 99.999, quantized_tensor_type: Dict[Any, Any] = {})[source]#

This method get the power-of-two quantize parameters for each tensor to minimize the mean-square-loss of quantized values and float values. This takes longer time but usually gets better accuracy.

Parameters:
  • model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.

  • op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None, which indicates that all float32/float16 tensors are calibrated.

  • augmented_model_path – Path to save the augmented model. Default is "augmented_model.onnx".

  • use_external_data_format (bool) – Whether to use external data format to store model which size is >= 2GB. Default is False.

  • activation_type (Union[QuantType, ExtendedQuantType]) – Type of quantization for activations. Default is QuantType.QInt8.

  • method (PowerOfTwoMethod) – Calibration method. Default is PowerOfTwoMethod.MinMSE.

  • symmetric (bool) – Whether to make the range of tensor symmetric (central point is 0). Default is True.

  • minmse_mode (str) – Mode for the MinMSE method. Default is "All".

  • percentile (float) – Percentile value for calibration, a float between 0 and 100. Default is 99.999.

  • quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying the quantized tensor type. Default is {}.

augment_graph() None[source]#

make all quantization_candidates op type nodes as part of the graph output.

Returns:

augmented ONNX model

collect_data(data_reader: CalibrationDataReader) None[source]#

abstract method: collect the tensors that will be used for range computation. It can be called multiple times.

compute_range() Any[source]#

Compute the min-max range of tensor

Returns:

dictionary mapping: {tensor name: (min value, max value)}

create_inference_session() None[source]#

create an OnnxRuntime InferenceSession.

class quark.onnx.calibrate.PowOfTwoCollector(activation_type: QuantType | ExtendedQuantType = QuantType.QInt8, method: PowerOfTwoMethod = PowerOfTwoMethod.MinMSE, symmetric: bool = True, minmse_mode: str = 'All', percentile: float = 99.999, quantized_tensor_type: Dict[Any, Any] = {})[source]#

Collecting PowOfTwoCollector quantize for each tensor. Support MinMSE method.

Parameters:
  • activation_type – Type of quantization for activations. Default is QuantType.QInt8.

  • method – Calibration method. Default is PowerOfTwoMethod.MinMSE.

  • symmetric – Whether to make the range of tensor symmetric (central point is 0). Default is True.

  • minmse_mode – Mode for the MinMSE method. Default is “All”.

  • percentile – Percentile value for calibration, a float between 0 and 100. Default is 99.999.

  • quantized_tensor_type – Dictionary specifying the quantized tensor type. Default is an empty dictionary.

collect(name_to_arr: Dict[Any, Any]) None[source]#
Generate informative data based on given data.
name_to_arrdict

tensor name to NDArray data

compute_collection_result() Any[source]#

Get the optimal result among collection data.

class quark.onnx.calibrate.LayerWiseMethod(*values)[source]#
class quark.onnx.calibrate.LayerWisePercentileCalibrater(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'percentile', symmetric: bool = False, num_bins: int = 2048, percentile: float = 99.999, lwp_metric: str = 'mae', activation_bitwidth: int = 8, percentile_candidates: List[float] = [99.99, 99.999, 99.9999])[source]#
Parameters:
  • model_input – ONNX model to calibrate. It is a model path or a ModelProto.

  • op_types_to_calibrate – operator types to calibrate. By default, calibrate all the float32/float16 tensors.

  • augmented_model_path – save augmented model to this path.

  • use_external_data_format – use external data format to store model which size is >= 2Gb

  • method – A string. One of [‘entropy’, ‘percentile’, ‘distribution’].

  • symmetric – make range of tensor symmetric (central point is 0).

  • num_quantized_bins – number of quantized bins. Default 128.

  • percentile – A float number between [0, 100]. Default 99.99.

  • lwp_mtric (str) – A str value which is use to judge the percentile’s metric. One of [‘mae’, ‘mse’]. Defaults to "mae".

  • activation_bitwidth (int) – Bitwidth for activations. Defaults to 8.

  • percentile_candidates (List[float]) – Percentile candidates. Defaults to [99.99, 99.999, 99.9999].

collect_data(data_reader: CalibrationDataReader) None[source]#

Entropy Calibrator collects operators’ tensors as well as generates tensor histogram for each operator.

compute_data() TensorsData[source]#

Compute the min-max range of tensor

Returns:

dictionary mapping: {tensor name: (min value, max value)}

quark.onnx.calibrate.create_calibrator_power_of_two(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', activation_type: ExtendedQuantType | QuantType = QuantType.QInt8, method: PowerOfTwoMethod = PowerOfTwoMethod.NonOverflow, use_external_data_format: bool = False, execution_providers: List[str] | None = ['CPUExecutionProvider'], quantized_tensor_type: Dict[Any, Any] = {}, extra_options: Dict[str, Any] = {}) Any[source]#

Create a calibrator for power-of-two quantization.

Parameters:
  • model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.

  • op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None, which indicates that all float32/float16 tensors are calibrated.

  • augmented_model_path – Path to save the augmented ONNX model.

  • activation_type – Type of quantization for activations.

  • method – Calibration method to use.

  • use_external_data_format – Whether to use external data format for large models.

  • execution_providers – List of execution providers for ONNX Runtime.

  • quantized_tensor_type – Dictionary specifying the quantized tensor type.

  • extra_options – Additional options for calibrator configuration.

Returns:

Initialized calibrator object.

quark.onnx.calibrate.create_calibrator_float_scale(model_input: str | Path | ModelProto, op_types_to_calibrate: Sequence[str] | None = None, augmented_model_path: str = 'augmented_model.onnx', calibrate_method: CalibrationMethod | LayerWisePercentileCalibrater = CalibrationMethod.MinMax, use_external_data_format: bool = False, execution_providers: List[str] | None = ['CPUExecutionProvider'], extra_options: Dict[str, Any] = {}) Any[source]#

Create a calibrator for floating-point scale quantization.

Parameters:
  • model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.

  • op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to None, which indicates that all float32/float16 tensors are calibrated.

  • augmented_model_path – Path to save the augmented ONNX model.

  • calibrate_method – Calibration method to use (MinMax, Entropy, Percentile, or Distribution).

  • use_external_data_format – Whether to use external data format for large models.

  • execution_providers – List of execution providers for ONNX Runtime.

  • extra_options – Additional options for calibrator configuration.

Returns:

Initialized calibrator object.