QDQ quantizer#
- class quark.onnx.qdq_quantizer.QDQQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], extra_options: ~typing.Any = None)[source]#
A class to perform quantization on an ONNX model using Quantize-Dequantize (QDQ) nodes.
- Parameters:
model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
extra_options (Any) – Additional options for quantization. Defaults to
None
.
- Inherits from:
onnxruntime.quantization.qdq_quantizer.QDQQuantizer
: Base class for ONNX QDQ quantization.
- quantize_bias_tensor(bias_name: str, input_name: str, weight_name: str, beta: float = 1.0) None [source]#
Adds a bias tensor to the list of bias tensors to quantize. Called by op quantizers that want to quantize a bias with bias_zero_point = 0 and bias_scale = input_scale * weight_scale * beta. TODO: Explain the reasoning for using this formula.
- Args:
node_name: name of the node that consumes the bias, input, and weight tensors. bias_name: name of the bias tensor to quantize. input_name: name of the input tensor whose scale is used to compute the bias’s scale. weight_name: name of the weight tensor whose scale is used to compute the bias’s scale. beta: Multiplier used to compute the bias’s scale.
- class quark.onnx.qdq_quantizer.QDQNPUTransformerQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], extra_options: ~typing.Dict[str, ~typing.Any] | None = None)[source]#
A class to perform quantization on an ONNX model using Quantize-Dequantize (QDQ) nodes optimized for NPU (Neural Processing Unit) Transformers.
- Parameters:
model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
extra_options (Any) – Additional options for quantization. Defaults to
None
.
- Inherits from:
onnxruntime.quantization.qdq_quantizer.QDQQuantizer
: Base class for ONNX QDQ quantization.
- quantize_bias_tensor(bias_name: str, input_name: str, weight_name: str, beta: float = 1.0) None [source]#
Adds a bias tensor to the list of bias tensors to quantize. Called by op quantizers that want to quantize a bias with bias_zero_point = 0 and bias_scale = input_scale * weight_scale * beta. TODO: Explain the reasoning for using this formula.
- Args:
node_name: name of the node that consumes the bias, input, and weight tensors. bias_name: name of the bias tensor to quantize. input_name: name of the input tensor whose scale is used to compute the bias’s scale. weight_name: name of the weight tensor whose scale is used to compute the bias’s scale. beta: Multiplier used to compute the bias’s scale.
- class quark.onnx.qdq_quantizer.VitisQDQQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], calibrate_method: ~typing.Any, quantized_tensor_type: ~typing.Dict[~typing.Any, ~typing.Any] = {}, extra_options: ~typing.Any = None)[source]#
A class to perform Vitis-specific Quantize-Dequantize (QDQ) quantization on an ONNX model.
- Parameters:
model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
calibrate_method (Any) – The method used for calibration.
quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types. Defaults to
{}
.extra_options (Any) – Additional options for quantization. Defaults to
None
.
- Inherits from:
VitisONNXQuantizer: Base class for Vitis-specific ONNX quantization.
- class quark.onnx.qdq_quantizer.VitisQDQNPUCNNQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], calibrate_method: ~typing.Any, quantized_tensor_type: ~typing.Dict[~typing.Any, ~typing.Any] = {}, extra_options: ~typing.Dict[str, ~typing.Any] | None = None)[source]#
A class to perform Vitis-specific Quantize-Dequantize (QDQ) quantization for NPU (Neural Processing Unit) on CNN models.
- Parameters:
model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization (must be False for NPU).
reduce_range (bool) – Whether to reduce the quantization range (must be False for NPU).
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights (must be QuantType.QInt8 for NPU).
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
calibrate_method (Any) – The method used for calibration.
quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types. Defaults to
{}
.extra_options (Any) – Additional options for quantization. Defaults to
None
.
- Inherits from:
VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.
- class quark.onnx.qdq_quantizer.VitisExtendedQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], calibrate_method: ~typing.Any, quantized_tensor_type: ~typing.Dict[~typing.Any, ~typing.Any], extra_options: ~typing.Dict[str, ~typing.Any] | None = None)[source]#
A class to perform extended Vitis-specific Quantize-Dequantize (QDQ) quantization.
- Parameters:
model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
calibrate_method (Any) – The method used for calibration.
quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types..
extra_options (Any) – Additional options for quantization. Defaults to
None
.
- Inherits from:
VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.
- class quark.onnx.qdq_quantizer.VitisBFPQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: ~typing.List[str], nodes_to_exclude: ~typing.List[str], op_types_to_quantize: ~typing.List[str], calibrate_method: ~typing.Any, quantized_tensor_type: ~typing.Dict[~typing.Any, ~typing.Any] = {}, extra_options: ~typing.Dict[str, ~typing.Any] | None = None)[source]#
A class to perform Vitis-specific Block Floating Point (BFP) Quantization-Dequantization (QDQ) quantization.
- Parameters:
model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
calibrate_method (Any) – The method used for calibration.
quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types..
extra_options (Any) – Additional options for quantization. Defaults to
None
.
- Inherits from:
VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.