quark.onnx.calibrate#

Module Contents#

Classes#

LayerWiseMethod

Create a collection of name/value pairs.

MinMaxCalibrater

This method obtains the quantization parameters based on the minimum and maximum values of each tensor.

EntropyCalibrater

This method determines the quantization parameters by considering the entropy algorithm of each tensor's distribution.

PercentileCalibrater

This method calculates quantization parameters using percentiles of the tensor values.

PowOfTwoCalibrater

This method get the power-of-two quantize parameters for each tensor to minimize the mean-square-loss of quantized values and float values. This takes longer time but usually gets better accuracy.

PowOfTwoCollector

Collecting PowOfTwoCollector quantize for each tensor. Support MinMSE method.

Functions#

create_calibrator_power_of_two(→ Any)

Create a calibrator for power-of-two quantization.

create_calibrator_float_scale(→ Any)

Create a calibrator for floating-point scale quantization.

class quark.onnx.calibrate.LayerWiseMethod(*args, **kwds)#

Create a collection of name/value pairs.

Example enumeration:

>>> class Color(Enum):
...     RED = 1
...     BLUE = 2
...     GREEN = 3

Access them by:

  • attribute access:

    >>> Color.RED
    <Color.RED: 1>
    
  • value lookup:

    >>> Color(1)
    <Color.RED: 1>
    
  • name lookup:

    >>> Color['RED']
    <Color.RED: 1>
    

Enumerations can be iterated over, and know how many members they have:

>>> len(Color)
3
>>> list(Color)
[<Color.RED: 1>, <Color.BLUE: 2>, <Color.GREEN: 3>]

Methods can be added to enumerations, and members can have their own attributes – see the documentation for details.

class quark.onnx.calibrate.MinMaxCalibrater(model_path: pathlib.Path, op_types_to_calibrate: List[str] | None, augmented_model_path: str = 'augmented_model.onnx', symmetric: bool = False, use_external_data_format: bool = False, moving_average: bool = False, averaging_constant: float = 0.01)#

This method obtains the quantization parameters based on the minimum and maximum values of each tensor.

Parameters:
  • model_path – Path to the ONNX model to calibrate.

  • op_types_to_calibrate – List of operator types to calibrate. By default, calibrates all the float32/float16 tensors.

  • augmented_model_path – Path to save the augmented model. Default is “augmented_model.onnx”.

  • symmetric – Whether to make the range of tensor symmetric (central point is 0). Default is False.

  • use_external_data_format – Whether to use external data format to store model which size is >= 2GB. Default is False.

  • moving_average – Whether to compute the moving average of the minimum and maximum values instead of the global minimum and maximum. Default is False.

  • averaging_constant – Constant smoothing factor to use when computing the moving average. Default is 0.01. Should be between 0 and 1.

Raises:

ValueError – If averaging_constant is not between 0 and 1 when moving_average is True.

class quark.onnx.calibrate.EntropyCalibrater(model_path: pathlib.Path, op_types_to_calibrate: List[str] | None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'entropy', symmetric: bool = False, num_bins: int = 128, num_quantized_bins: int = 128)#

This method determines the quantization parameters by considering the entropy algorithm of each tensor’s distribution.

Parameters:
  • model_path – Path to the ONNX model to calibrate.

  • op_types_to_calibrate – List of operator types to calibrate. By default, calibrates all the float32/float16 tensors.

  • augmented_model_path – Path to save the augmented model. Default is “augmented_model.onnx”.

  • use_external_data_format – Whether to use external data format to store model which size is >= 2GB. Default is False.

  • method – Method for calibration. One of [‘entropy’, ‘percentile’, ‘distribution’]. Default is “entropy”.

  • symmetric – Whether to make the range of tensor symmetric (central point is 0). Default is False.

  • num_bins – Number of bins to create a new histogram for collecting tensor values. Default is 128.

  • num_quantized_bins – Number of quantized bins. Default is 128.

class quark.onnx.calibrate.PercentileCalibrater(model_path: pathlib.Path, op_types_to_calibrate: List[str] | None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'percentile', symmetric: bool = False, num_bins: int = 2048, percentile: float = 99.999)#

This method calculates quantization parameters using percentiles of the tensor values.

Parameters:
  • model_path – Path to the ONNX model to calibrate.

  • op_types_to_calibrate – List of operator types to calibrate. By default, calibrates all the float32/float16 tensors.

  • augmented_model_path – Path to save the augmented model. Default is “augmented_model.onnx”.

  • use_external_data_format – Whether to use external data format to store model which size is >= 2GB. Default is False.

  • method – Method for calibration. One of [‘entropy’, ‘percentile’, ‘distribution’]. Default is “percentile”.

  • symmetric – Whether to make the range of tensor symmetric (central point is 0). Default is False.

  • num_bins – Number of bins to create a new histogram for collecting tensor values. Default is 2048.

  • percentile – Percentile value for calibration, a float between [0, 100]. Default is 99.999.

class quark.onnx.calibrate.PowOfTwoCalibrater(model: pathlib.Path, op_types_to_calibrate: Sequence[str] | None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, activation_type: onnxruntime.quantization.quant_utils.QuantType | quark.onnx.quant_utils.VitisQuantType = QuantType.QInt8, method: quark.onnx.quant_utils.PowerOfTwoMethod = PowerOfTwoMethod.MinMSE, symmetric: bool = True, minmse_mode: str = 'All', percentile: float = 99.999, quantized_tensor_type: Dict[Any, Any] = {})#

This method get the power-of-two quantize parameters for each tensor to minimize the mean-square-loss of quantized values and float values. This takes longer time but usually gets better accuracy.

Parameters:
  • model – Path to the ONNX model to calibrate.

  • op_types_to_calibrate – List of operator types to calibrate. By default, calibrates all the float32/float16 tensors.

  • augmented_model_path – Path to save the augmented model. Default is “augmented_model.onnx”.

  • use_external_data_format – Whether to use external data format to store model which size is >= 2GB. Default is False.

  • activation_type – Type of quantization for activations. Default is QuantType.QInt8.

  • method – Calibration method. Default is PowerOfTwoMethod.MinMSE.

  • symmetric – Whether to make the range of tensor symmetric (central point is 0). Default is True.

  • minmse_mode – Mode for the MinMSE method. Default is “All”.

  • percentile – Percentile value for calibration, a float between 0 and 100. Default is 99.999.

  • quantized_tensor_type – Dictionary specifying the quantized tensor type. Default is an empty dictionary.

augment_graph() None#

make all quantization_candidates op type nodes as part of the graph output. :return: augmented ONNX model

compute_range() Any#

Compute the min-max range of tensor :return: dictionary mapping: {tensor name: (min value, max value)}

class quark.onnx.calibrate.PowOfTwoCollector(activation_type: onnxruntime.quantization.quant_utils.QuantType | quark.onnx.quant_utils.VitisQuantType = QuantType.QInt8, method: quark.onnx.quant_utils.PowerOfTwoMethod = PowerOfTwoMethod.MinMSE, symmetric: bool = True, minmse_mode: str = 'All', percentile: float = 99.999, quantized_tensor_type: Dict[Any, Any] = {})#

Collecting PowOfTwoCollector quantize for each tensor. Support MinMSE method.

Parameters:
  • activation_type – Type of quantization for activations. Default is QuantType.QInt8.

  • method – Calibration method. Default is PowerOfTwoMethod.MinMSE.

  • symmetric – Whether to make the range of tensor symmetric (central point is 0). Default is True.

  • minmse_mode – Mode for the MinMSE method. Default is “All”.

  • percentile – Percentile value for calibration, a float between 0 and 100. Default is 99.999.

  • quantized_tensor_type – Dictionary specifying the quantized tensor type. Default is an empty dictionary.

quark.onnx.calibrate.create_calibrator_power_of_two(model: pathlib.Path, op_types_to_calibrate: List[str], augmented_model_path: str = 'augmented_model.onnx', activation_type: quark.onnx.quant_utils.VitisQuantType | onnxruntime.quantization.quant_utils.QuantType = QuantType.QInt8, method: quark.onnx.quant_utils.PowerOfTwoMethod = PowerOfTwoMethod.NonOverflow, use_external_data_format: bool = False, execution_providers: List[str] | None = ['CPUExecutionProvider'], quantized_tensor_type: Dict[Any, Any] = {}, extra_options: Dict[str, Any] = {}) Any#

Create a calibrator for power-of-two quantization.

Parameters:
  • model – Path to the ONNX model to calibrate.

  • op_types_to_calibrate – List of operator types to calibrate.

  • augmented_model_path – Path to save the augmented ONNX model.

  • activation_type – Type of quantization for activations.

  • method – Calibration method to use.

  • use_external_data_format – Whether to use external data format for large models.

  • execution_providers – List of execution providers for ONNX Runtime.

  • quantized_tensor_type – Dictionary specifying the quantized tensor type.

  • extra_options – Additional options for calibrator configuration.

Returns:

Initialized calibrator object.

quark.onnx.calibrate.create_calibrator_float_scale(model: pathlib.Path, op_types_to_calibrate: List[str] | None, augmented_model_path: str = 'augmented_model.onnx', calibrate_method: onnxruntime.quantization.calibrate.CalibrationMethod | LayerWisePercentileCalibrater = CalibrationMethod.MinMax, use_external_data_format: bool = False, execution_providers: List[str] | None = ['CPUExecutionProvider'], extra_options: Dict[str, Any] = {}) Any#

Create a calibrator for floating-point scale quantization.

Parameters:
  • model – Path to the ONNX model to calibrate.

  • op_types_to_calibrate – List of operator types to calibrate. If None, all float32/float16 tensors are calibrated.

  • augmented_model_path – Path to save the augmented ONNX model.

  • calibrate_method – Calibration method to use (MinMax, Entropy, Percentile, or Distribution).

  • use_external_data_format – Whether to use external data format for large models.

  • execution_providers – List of execution providers for ONNX Runtime.

  • extra_options – Additional options for calibrator configuration.

Returns:

Initialized calibrator object.