quark.onnx.calibrate#

Module Contents#

Classes#

Functions#

class quark.onnx.calibrate.MinMaxCalibrater(model_path: pathlib.Path, op_types_to_calibrate: List[str] | None, augmented_model_path: str = 'augmented_model.onnx', symmetric: bool = False, use_external_data_format: bool = False, moving_average: bool = False, averaging_constant: float = 0.01)#

This method obtains the quantization parameters based on the minimum and maximum values of each tensor.

Parameters:
  • model_path – Path to the ONNX model to calibrate.

  • op_types_to_calibrate – List of operator types to calibrate. By default, calibrates all the float32/float16 tensors.

  • augmented_model_path – Path to save the augmented model. Default is “augmented_model.onnx”.

  • symmetric – Whether to make the range of tensor symmetric (central point is 0). Default is False.

  • use_external_data_format – Whether to use external data format to store model which size is >= 2GB. Default is False.

  • moving_average – Whether to compute the moving average of the minimum and maximum values instead of the global minimum and maximum. Default is False.

  • averaging_constant – Constant smoothing factor to use when computing the moving average. Default is 0.01. Should be between 0 and 1.

Raises:

ValueError – If averaging_constant is not between 0 and 1 when moving_average is True.

class quark.onnx.calibrate.EntropyCalibrater(model_path: pathlib.Path, op_types_to_calibrate: List[str] | None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'entropy', symmetric: bool = False, num_bins: int = 128, num_quantized_bins: int = 128)#

This method determines the quantization parameters by considering the entropy algorithm of each tensor’s distribution.

Parameters:
  • model_path – Path to the ONNX model to calibrate.

  • op_types_to_calibrate – List of operator types to calibrate. By default, calibrates all the float32/float16 tensors.

  • augmented_model_path – Path to save the augmented model. Default is “augmented_model.onnx”.

  • use_external_data_format – Whether to use external data format to store model which size is >= 2GB. Default is False.

  • method – Method for calibration. One of [‘entropy’, ‘percentile’, ‘distribution’]. Default is “entropy”.

  • symmetric – Whether to make the range of tensor symmetric (central point is 0). Default is False.

  • num_bins – Number of bins to create a new histogram for collecting tensor values. Default is 128.

  • num_quantized_bins – Number of quantized bins. Default is 128.

class quark.onnx.calibrate.PercentileCalibrater(model_path: pathlib.Path, op_types_to_calibrate: List[str] | None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, method: str = 'percentile', symmetric: bool = False, num_bins: int = 2048, percentile: float = 99.999)#

This method calculates quantization parameters using percentiles of the tensor values.

Parameters:
  • model_path – Path to the ONNX model to calibrate.

  • op_types_to_calibrate – List of operator types to calibrate. By default, calibrates all the float32/float16 tensors.

  • augmented_model_path – Path to save the augmented model. Default is “augmented_model.onnx”.

  • use_external_data_format – Whether to use external data format to store model which size is >= 2GB. Default is False.

  • method – Method for calibration. One of [‘entropy’, ‘percentile’, ‘distribution’]. Default is “percentile”.

  • symmetric – Whether to make the range of tensor symmetric (central point is 0). Default is False.

  • num_bins – Number of bins to create a new histogram for collecting tensor values. Default is 2048.

  • percentile – Percentile value for calibration, a float between [0, 100]. Default is 99.999.

class quark.onnx.calibrate.PowOfTwoCalibrater(model: pathlib.Path, op_types_to_calibrate: Sequence[str] | None, augmented_model_path: str = 'augmented_model.onnx', use_external_data_format: bool = False, activation_type: onnxruntime.quantization.quant_utils.QuantType | quark.onnx.quant_utils.VitisQuantType = QuantType.QInt8, method: quark.onnx.quant_utils.PowerOfTwoMethod = PowerOfTwoMethod.MinMSE, symmetric: bool = True, minmse_mode: str = 'All', percentile: float = 99.999, quantized_tensor_type: Dict[Any, Any] = {})#

This method get the power-of-two quantize parameters for each tensor to minimize the mean-square-loss of quantized values and float values. This takes longer time but usually gets better accuracy.

Parameters:
  • model – Path to the ONNX model to calibrate.

  • op_types_to_calibrate – List of operator types to calibrate. By default, calibrates all the float32/float16 tensors.

  • augmented_model_path – Path to save the augmented model. Default is “augmented_model.onnx”.

  • use_external_data_format – Whether to use external data format to store model which size is >= 2GB. Default is False.

  • activation_type – Type of quantization for activations. Default is QuantType.QInt8.

  • method – Calibration method. Default is PowerOfTwoMethod.MinMSE.

  • symmetric – Whether to make the range of tensor symmetric (central point is 0). Default is True.

  • minmse_mode – Mode for the MinMSE method. Default is “All”.

  • percentile – Percentile value for calibration, a float between 0 and 100. Default is 99.999.

  • quantized_tensor_type – Dictionary specifying the quantized tensor type. Default is an empty dictionary.

augment_graph() None#

make all quantization_candidates op type nodes as part of the graph output. :return: augmented ONNX model

compute_range() Any#

Compute the min-max range of tensor :return: dictionary mapping: {tensor name: (min value, max value)}

class quark.onnx.calibrate.PowOfTwoCollector(activation_type: onnxruntime.quantization.quant_utils.QuantType | quark.onnx.quant_utils.VitisQuantType = QuantType.QInt8, method: quark.onnx.quant_utils.PowerOfTwoMethod = PowerOfTwoMethod.MinMSE, symmetric: bool = True, minmse_mode: str = 'All', percentile: float = 99.999, quantized_tensor_type: Dict[Any, Any] = {})#

Collecting PowOfTwoCollector quantize for each tensor. Support MinMSE method.

Parameters:
  • activation_type – Type of quantization for activations. Default is QuantType.QInt8.

  • method – Calibration method. Default is PowerOfTwoMethod.MinMSE.

  • symmetric – Whether to make the range of tensor symmetric (central point is 0). Default is True.

  • minmse_mode – Mode for the MinMSE method. Default is “All”.

  • percentile – Percentile value for calibration, a float between 0 and 100. Default is 99.999.

  • quantized_tensor_type – Dictionary specifying the quantized tensor type. Default is an empty dictionary.

quark.onnx.calibrate.create_calibrator_power_of_two(model: pathlib.Path, op_types_to_calibrate: List[str], augmented_model_path: str = 'augmented_model.onnx', activation_type: quark.onnx.quant_utils.VitisQuantType | onnxruntime.quantization.quant_utils.QuantType = QuantType.QInt8, method: quark.onnx.quant_utils.PowerOfTwoMethod = PowerOfTwoMethod.NonOverflow, use_external_data_format: bool = False, execution_providers: List[str] | None = ['CPUExecutionProvider'], quantized_tensor_type: Dict[Any, Any] = {}, extra_options: Dict[str, Any] = {}) Any#

Create a calibrator for power-of-two quantization.

Parameters:
  • model – Path to the ONNX model to calibrate.

  • op_types_to_calibrate – List of operator types to calibrate.

  • augmented_model_path – Path to save the augmented ONNX model.

  • activation_type – Type of quantization for activations.

  • method – Calibration method to use.

  • use_external_data_format – Whether to use external data format for large models.

  • execution_providers – List of execution providers for ONNX Runtime.

  • quantized_tensor_type – Dictionary specifying the quantized tensor type.

  • extra_options – Additional options for calibrator configuration.

Returns:

Initialized calibrator object.

quark.onnx.calibrate.create_calibrator_float_scale(model: pathlib.Path, op_types_to_calibrate: List[str] | None, augmented_model_path: str = 'augmented_model.onnx', calibrate_method: onnxruntime.quantization.calibrate.CalibrationMethod = CalibrationMethod.MinMax, use_external_data_format: bool = False, execution_providers: List[str] | None = ['CPUExecutionProvider'], extra_options: Dict[str, Any] = {}) Any#

Create a calibrator for floating-point scale quantization.

Parameters:
  • model – Path to the ONNX model to calibrate.

  • op_types_to_calibrate – List of operator types to calibrate. If None, all float32/float16 tensors are calibrated.

  • augmented_model_path – Path to save the augmented ONNX model.

  • calibrate_method – Calibration method to use (MinMax, Entropy, Percentile, or Distribution).

  • use_external_data_format – Whether to use external data format for large models.

  • execution_providers – List of execution providers for ONNX Runtime.

  • extra_options – Additional options for calibrator configuration.

Returns:

Initialized calibrator object.