QDQ quantizer#

class quark.onnx.qdq_quantizer.QDQQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], extra_options: ~typing.Any = None)[source]#

A class to perform quantization on an ONNX model using Quantize-Dequantize (QDQ) nodes.

Parameters:
  • model (onnx.ModelProto) – The ONNX model to be quantized.

  • per_channel (bool) – Whether to perform per-channel quantization.

  • reduce_range (bool) – Whether to reduce the quantization range.

  • mode (QuantizationMode.QLinearOps) – The quantization mode to be used.

  • static (bool) – Whether to use static quantization.

  • weight_qType (Any) – The quantization type for weights.

  • activation_qType (Any) – The quantization type for activations.

  • tensors_range (Any) – Dictionary specifying the min and max values for tensors.

  • nodes_to_quantize (List[str]) – List of node names to be quantized.

  • nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.

  • op_types_to_quantize (List[str]) – List of operation types to be quantized.

  • extra_options (Any) – Additional options for quantization. Defaults to None.

Inherits from:

onnxruntime.quantization.qdq_quantizer.QDQQuantizer: Base class for ONNX QDQ quantization.

quantize_bias_tensor(bias_name: str, input_name: str, weight_name: str, beta: float = 1.0) None[source]#

Adds a bias tensor to the list of bias tensors to quantize. Called by op quantizers that want to quantize a bias with bias_zero_point = 0 and bias_scale = input_scale * weight_scale * beta. TODO: Explain the reasoning for using this formula.

Args:

node_name: name of the node that consumes the bias, input, and weight tensors. bias_name: name of the bias tensor to quantize. input_name: name of the input tensor whose scale is used to compute the bias’s scale. weight_name: name of the weight tensor whose scale is used to compute the bias’s scale. beta: Multiplier used to compute the bias’s scale.

class quark.onnx.qdq_quantizer.QDQNPUTransformerQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], extra_options: ~typing.Dict[str, ~typing.Any] | None = None)[source]#

A class to perform quantization on an ONNX model using Quantize-Dequantize (QDQ) nodes optimized for NPU (Neural Processing Unit) Transformers.

Parameters:
  • model (onnx.ModelProto) – The ONNX model to be quantized.

  • per_channel (bool) – Whether to perform per-channel quantization.

  • reduce_range (bool) – Whether to reduce the quantization range.

  • mode (QuantizationMode.QLinearOps) – The quantization mode to be used.

  • static (bool) – Whether to use static quantization.

  • weight_qType (Any) – The quantization type for weights.

  • activation_qType (Any) – The quantization type for activations.

  • tensors_range (Any) – Dictionary specifying the min and max values for tensors.

  • nodes_to_quantize (List[str]) – List of node names to be quantized.

  • nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.

  • op_types_to_quantize (List[str]) – List of operation types to be quantized.

  • extra_options (Any) – Additional options for quantization. Defaults to None.

Inherits from:

onnxruntime.quantization.qdq_quantizer.QDQQuantizer: Base class for ONNX QDQ quantization.

quantize_bias_tensor(bias_name: str, input_name: str, weight_name: str, beta: float = 1.0) None[source]#

Adds a bias tensor to the list of bias tensors to quantize. Called by op quantizers that want to quantize a bias with bias_zero_point = 0 and bias_scale = input_scale * weight_scale * beta. TODO: Explain the reasoning for using this formula.

Args:

node_name: name of the node that consumes the bias, input, and weight tensors. bias_name: name of the bias tensor to quantize. input_name: name of the input tensor whose scale is used to compute the bias’s scale. weight_name: name of the weight tensor whose scale is used to compute the bias’s scale. beta: Multiplier used to compute the bias’s scale.

class quark.onnx.qdq_quantizer.VitisQDQQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], calibrate_method: ~typing.Any, quantized_tensor_type: ~typing.Dict[~typing.Any, ~typing.Any] = {}, extra_options: ~typing.Any = None)[source]#

A class to perform Vitis-specific Quantize-Dequantize (QDQ) quantization on an ONNX model.

Parameters:
  • model (onnx.ModelProto) – The ONNX model to be quantized.

  • per_channel (bool) – Whether to perform per-channel quantization.

  • reduce_range (bool) – Whether to reduce the quantization range.

  • mode (QuantizationMode.QLinearOps) – The quantization mode to be used.

  • static (bool) – Whether to use static quantization.

  • weight_qType (Any) – The quantization type for weights.

  • activation_qType (Any) – The quantization type for activations.

  • tensors_range (Any) – Dictionary specifying the min and max values for tensors.

  • nodes_to_quantize (List[str]) – List of node names to be quantized.

  • nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.

  • op_types_to_quantize (List[str]) – List of operation types to be quantized.

  • calibrate_method (Any) – The method used for calibration.

  • quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types. Defaults to {}.

  • extra_options (Any) – Additional options for quantization. Defaults to None.

Inherits from:

VitisONNXQuantizer: Base class for Vitis-specific ONNX quantization.

class quark.onnx.qdq_quantizer.VitisQDQNPUCNNQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], calibrate_method: ~typing.Any, quantized_tensor_type: ~typing.Dict[~typing.Any, ~typing.Any] = {}, extra_options: ~typing.Dict[str, ~typing.Any] | None = None)[source]#

A class to perform Vitis-specific Quantize-Dequantize (QDQ) quantization for NPU (Neural Processing Unit) on CNN models.

Parameters:
  • model (onnx.ModelProto) – The ONNX model to be quantized.

  • per_channel (bool) – Whether to perform per-channel quantization (must be False for NPU).

  • reduce_range (bool) – Whether to reduce the quantization range (must be False for NPU).

  • mode (QuantizationMode.QLinearOps) – The quantization mode to be used.

  • static (bool) – Whether to use static quantization.

  • weight_qType (Any) – The quantization type for weights (must be QuantType.QInt8 for NPU).

  • activation_qType (Any) – The quantization type for activations.

  • tensors_range (Any) – Dictionary specifying the min and max values for tensors.

  • nodes_to_quantize (List[str]) – List of node names to be quantized.

  • nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.

  • op_types_to_quantize (List[str]) – List of operation types to be quantized.

  • calibrate_method (Any) – The method used for calibration.

  • quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types. Defaults to {}.

  • extra_options (Any) – Additional options for quantization. Defaults to None.

Inherits from:

VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.

class quark.onnx.qdq_quantizer.VitisExtendedQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], calibrate_method: ~typing.Any, quantized_tensor_type: ~typing.Dict[~typing.Any, ~typing.Any], extra_options: ~typing.Dict[str, ~typing.Any] | None = None)[source]#

A class to perform extended Vitis-specific Quantize-Dequantize (QDQ) quantization.

Parameters:
  • model (onnx.ModelProto) – The ONNX model to be quantized.

  • per_channel (bool) – Whether to perform per-channel quantization.

  • reduce_range (bool) – Whether to reduce the quantization range.

  • mode (QuantizationMode.QLinearOps) – The quantization mode to be used.

  • static (bool) – Whether to use static quantization.

  • weight_qType (Any) – The quantization type for weights.

  • activation_qType (Any) – The quantization type for activations.

  • tensors_range (Any) – Dictionary specifying the min and max values for tensors.

  • nodes_to_quantize (List[str]) – List of node names to be quantized.

  • nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.

  • op_types_to_quantize (List[str]) – List of operation types to be quantized.

  • calibrate_method (Any) – The method used for calibration.

  • quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types..

  • extra_options (Any) – Additional options for quantization. Defaults to None.

Inherits from:

VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.

class quark.onnx.qdq_quantizer.VitisBFPQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], calibrate_method: ~typing.Any, quantized_tensor_type: ~typing.Dict[~typing.Any, ~typing.Any] = {}, extra_options: ~typing.Dict[str, ~typing.Any] | None = None)[source]#

A class to perform Vitis-specific Block Floating Point (BFP) Quantization-Dequantization (QDQ) quantization.

Parameters:
  • model (onnx.ModelProto) – The ONNX model to be quantized.

  • per_channel (bool) – Whether to perform per-channel quantization.

  • reduce_range (bool) – Whether to reduce the quantization range.

  • mode (QuantizationMode.QLinearOps) – The quantization mode to be used.

  • static (bool) – Whether to use static quantization.

  • weight_qType (Any) – The quantization type for weights.

  • activation_qType (Any) – The quantization type for activations.

  • tensors_range (Any) – Dictionary specifying the min and max values for tensors.

  • nodes_to_quantize (List[str]) – List of node names to be quantized.

  • nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.

  • op_types_to_quantize (List[str]) – List of operation types to be quantized.

  • calibrate_method (Any) – The method used for calibration.

  • quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types..

  • extra_options (Any) – Additional options for quantization. Defaults to None.

Inherits from:

VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.