ONNX quantization configuration#

Quark Quantization Config API for ONNX

class quark.onnx.quantization.config.config.Config(global_quant_config: QuantizationConfig)[source]#

A class that encapsulates comprehensive quantization configurations for a machine learning model, allowing for detailed and hierarchical control over quantization parameters across different model components.

Parameters:: global_quant_config (QuantizationConfig) – Global quantization configuration applied to the entire model unless overridden at the layer level.

class quark.onnx.quantization.config.config.QuantizationConfig(calibrate_method: CalibrationMethod | PowerOfTwoMethod = CalibrationMethod.MinMax, quant_format: QuantFormat | ExtendedQuantFormat = QuantFormat.QDQ, activation_type: QuantType | ExtendedQuantType = QuantType.QInt8, weight_type: QuantType | ExtendedQuantType = QuantType.QInt8, input_nodes: List[str] = [], output_nodes: List[str] = [], op_types_to_quantize: List[str] = [], nodes_to_quantize: List[str] = [], extra_op_types_to_quantize: List[str] = [], nodes_to_exclude: List[str] = [], subgraphs_to_exclude: List[Tuple[List[str]]] = [], specific_tensor_precision: bool = False, execution_providers: List[str] = [], per_channel: bool = False, reduce_range: bool = False, optimize_model: bool = True, use_dynamic_quant: bool = False, use_external_data_format: bool = False, convert_fp16_to_fp32: bool = False, convert_nchw_to_nhwc: bool = False, include_sq: bool = False, include_rotation: bool = False, include_cle: bool = True, include_auto_mp: bool = False, include_fast_ft: bool = False, enable_npu_cnn: bool = False, enable_npu_transformer: bool = False, debug_mode: bool = False, crypto_mode: bool = False, print_summary: bool = True, ignore_warnings: bool = True, log_severity_level: int = 1, extra_options: Dict[str, Any] = {})[source]#

A data class that specifies quantization configurations for different components of a module, allowing hierarchical control over how each tensor type is quantized.

Parameters:

calibrate_method (Union[CalibrationMethod, PowerOfTwoMethod]) – Method used for calibration. Default is CalibrationMethod.MinMax.
quant_format (Union[QuantFormat, ExtendedQuantType]) – Format of quantization. Default is QuantFormat.QDQ.
activation_type (Union[QuantType, ExtendedQuantType]) – Type of quantization for activations. Default is QuantType.QInt8.
weight_type (Union[QuantFormat, ExtendedQuantType]) – Type of quantization for weights. Default is QuantType.QInt8.
input_nodes (List[str]) – List of input nodes to be quantized. Default is [].
output_nodes (List[str]) – List of output nodes to be quantized. Default is [].
op_types_to_quantize (List[str]) – List of operation types to be quantized. Default is [].
extra_op_types_to_quantize (List[str]) – List of additional operation types to be quantized. Default is [].
nodes_to_quantize (List[str]) – List of node names to be quantized. Default is [].
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization. Default is [].
subgraphs_to_exclude (List[Tuple[List[str]]) – List of start and end node names of subgraphs to be excluded from quantization. Default is [].
specific_tensor_precision (bool) – Flag to enable specific tensor precision. Default is False.
execution_providers (List[str]) – List of execution providers. Default is ['CPUExecutionProvider'].
per_channel (bool) – Flag to enable per-channel quantization. Default is False.
reduce_range (bool) – Flag to reduce quantization range. Default is False.
optimize_model (bool) – Flag to optimize the model. Default is True.
use_dynamic_quant (bool) – Flag to use dynamic quantization. Default is False.
use_external_data_format (bool) – Flag to use external data format. Default is False.
convert_fp16_to_fp32 (bool) – Flag to convert FP16 to FP32. Default is False.
convert_nchw_to_nhwc (bool) – Flag to convert NCHW to NHWC. Default is False.
include_sq (bool) – Flag to include square root in quantization. Default is False.
include_cle (bool) – Flag to include CLE in quantization. Default is True.
include_auto_mp (bool) – Flag to include automatic mixed precision. Default is False.
include_fast_ft (bool) – Flag to include fast fine-tuning. Default is False.
enable_npu_cnn (bool) – Flag to enable NPU CNN. Default is False.
enable_npu_transformer (bool) – Flag to enable NPU Transformer. Default is False.
debug_mode (bool) – Flag to enable debug mode. Default is False.
print_summary (bool) – Flag to print summary of quantization. Default is True.
ignore_warnings: (bool) – Flag to suppress the warnings globally. Default is True.
log_severity_level (int) – 0:DEBUG, 1:INFO, 2:WARNING. 3:ERROR, 4:CRITICAL/FATAL. Default is 1.
extra_options (Dict[str, Any]) – Dictionary for additional options. Default is {}.
crypto_mode (bool) – Flag to enable crypto mode (the model information will be encrypted or hidden). Default is False.

ONNX quantization configuration

Contents

ONNX quantization configuration#