QDQ quantizer#

class quark.onnx.qdq_quantizer.QDQQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], extra_options: ~typing.Any = None)[source]#

A class to perform quantization on an ONNX model using Quantize-Dequantize (QDQ) nodes.

Parameters:

model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
extra_options (Any) – Additional options for quantization. Defaults to None.

Inherits from:: onnxruntime.quantization.qdq_quantizer.QDQQuantizer: Base class for ONNX QDQ quantization.

quantize_bias_tensor(bias_name: str, input_name: str, weight_name: str, beta: float = 1.0) → None[source]#

Adds a bias tensor to the list of bias tensors to quantize. Called by op quantizers that want to quantize a bias with bias_zero_point = 0 and bias_scale = input_scale * weight_scale * beta. TODO: Explain the reasoning for using this formula.

Args:: node_name: name of the node that consumes the bias, input, and weight tensors. bias_name: name of the bias tensor to quantize. input_name: name of the input tensor whose scale is used to compute the bias’s scale. weight_name: name of the weight tensor whose scale is used to compute the bias’s scale. beta: Multiplier used to compute the bias’s scale.

class quark.onnx.qdq_quantizer.QDQNPUTransformerQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], extra_options: ~typing.Dict[str, ~typing.Any] | None = None)[source]#

A class to perform quantization on an ONNX model using Quantize-Dequantize (QDQ) nodes optimized for NPU (Neural Processing Unit) Transformers.

Parameters:

model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
extra_options (Any) – Additional options for quantization. Defaults to None.

Inherits from:: onnxruntime.quantization.qdq_quantizer.QDQQuantizer: Base class for ONNX QDQ quantization.

quantize_bias_tensor(bias_name: str, input_name: str, weight_name: str, beta: float = 1.0) → None[source]#

Args:: node_name: name of the node that consumes the bias, input, and weight tensors. bias_name: name of the bias tensor to quantize. input_name: name of the input tensor whose scale is used to compute the bias’s scale. weight_name: name of the weight tensor whose scale is used to compute the bias’s scale. beta: Multiplier used to compute the bias’s scale.

class quark.onnx.qdq_quantizer.VitisQDQQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], calibrate_method: ~typing.Any, quantized_tensor_type: ~typing.Dict[~typing.Any, ~typing.Any] = {}, extra_options: ~typing.Any = None)[source]#

A class to perform Vitis-specific Quantize-Dequantize (QDQ) quantization on an ONNX model.

Parameters:

model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
calibrate_method (Any) – The method used for calibration.
quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types. Defaults to {}.
extra_options (Any) – Additional options for quantization. Defaults to None.

Inherits from:: VitisONNXQuantizer: Base class for Vitis-specific ONNX quantization.

class quark.onnx.qdq_quantizer.VitisQDQNPUCNNQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], calibrate_method: ~typing.Any, quantized_tensor_type: ~typing.Dict[~typing.Any, ~typing.Any] = {}, extra_options: ~typing.Dict[str, ~typing.Any] | None = None)[source]#

A class to perform Vitis-specific Quantize-Dequantize (QDQ) quantization for NPU (Neural Processing Unit) on CNN models.

Parameters:

model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization (must be False for NPU).
reduce_range (bool) – Whether to reduce the quantization range (must be False for NPU).
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights (must be QuantType.QInt8 for NPU).
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
calibrate_method (Any) – The method used for calibration.
quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types. Defaults to {}.
extra_options (Any) – Additional options for quantization. Defaults to None.

Inherits from:: VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.

class quark.onnx.qdq_quantizer.VitisExtendedQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], calibrate_method: ~typing.Any, quantized_tensor_type: ~typing.Dict[~typing.Any, ~typing.Any], extra_options: ~typing.Dict[str, ~typing.Any] | None = None)[source]#

A class to perform extended Vitis-specific Quantize-Dequantize (QDQ) quantization.

Parameters:

model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
calibrate_method (Any) – The method used for calibration.
quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types..
extra_options (Any) – Additional options for quantization. Defaults to None.

Inherits from:: VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.

class quark.onnx.qdq_quantizer.VitisBFPQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], calibrate_method: ~typing.Any, quantized_tensor_type: ~typing.Dict[~typing.Any, ~typing.Any] = {}, extra_options: ~typing.Dict[str, ~typing.Any] | None = None)[source]#

A class to perform Vitis-specific Block Floating Point (BFP) Quantization-Dequantization (QDQ) quantization.

Parameters:

model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
calibrate_method (Any) – The method used for calibration.
quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types..
extra_options (Any) – Additional options for quantization. Defaults to None.

Inherits from:: VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.

QDQ quantizer

Contents

QDQ quantizer#