ONNX quantizer#

class quark.onnx.onnx_quantizer.ONNXQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: list[str], nodes_to_exclude: list[str], op_types_to_quantize: list[str], extra_options: dict[str, ~typing.Any] | None = None)[source]#

A class to perform quantization on an ONNX model.

Parameters:

model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – The range of tensors for quantization.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
extra_options (Optional[Dict[str, Any]]) – Additional options for quantization.

Inherits from:: onnxruntime.quantization.onnx_quantizer.ONNXQuantizer: Base class for ONNX quantization.

class quark.onnx.onnx_quantizer.VitisONNXQuantizer(model: ~onnx.onnx_ml_pb2.ModelProto, per_channel: bool, reduce_range: bool, mode: <QuantizationMode.QLinearOps: 1>, static: bool, weight_qType: ~typing.Any, activation_qType: ~typing.Any, tensors_range: ~typing.Any, nodes_to_quantize: list[str], nodes_to_exclude: list[str], op_types_to_quantize: list[str], calibrate_method: ~typing.Any, quantized_tensor_type: dict[~typing.Any, ~typing.Any] = {}, extra_options: dict[str, ~typing.Any] | None = None)[source]#

A class to perform quantization on an ONNX model specifically optimized for Vitis AI.

Parameters:

model (onnx.ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
(bool) (bool static) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
calibrate_method (Any) – The calibration method to be used.
quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying the types for quantized tensors. Defaults to {}.
extra_options (Optional[Dict[str, Any]]) – Additional options for quantization. Defaults to None.

Inherits from:: onnxruntime.quantization.onnx_quantizer.ONNXQuantizer: Base class for ONNX quantization.

find_quant_scale_zp(input_name: str) → Any[source]#

Finds the quantization scale and zero-point for a given input.

This method looks up the quantization scale and zero-point values for the specified input name. It first checks the current instance’s used_scale_zp_map. If not found, it recursively checks the parent instance if one exists.

Parameters:: input_name (str) – The name of the input for which to find the quantization scale and zero-point.
Returns:: A tuple containing the quantization scale and zero-point if found, otherwise (None, None).
Return type:: Any

find_quantized_value(input_name: str) → Any[source]#

Finds the quantized value for a given input.

This method looks up the quantized value for the specified input name. It first checks the current instance’s quantized_value_map. If not found, it recursively checks the parent instance if one exists.

Parameters:: input_name (str) – The name of the input for which to find the quantized value.
Returns:: The quantized value if found, otherwise None.
Return type:: Any

quantize_bias_static(bias_name: str, input_name: str, weight_name: str, beta: float = 1.0) → Any[source]#

Quantizes the bias using static quantization. Zero Point == 0 and Scale == Input_Scale * Weight_Scale.

This method performs the following steps: 1. Validates the weight quantization type. 2. Retrieves the scale for the weight. 3. Retrieves the bias data and its scale. 4. Retrieves the scale for the input. 5. Calculates the scale for the bias. 6. Quantizes the bias data. 7. Updates the bias, scale, and zero-point initializers in the model. 8. Updates the quantized value map with the new quantized bias information.

Parameters:

bias_name (str) – The name of the bias to be quantized.
input_name (str) – The name of the input associated with the bias.
weight_name (str) – The name of the weight associated with the bias.
beta (float) – A scaling factor applied during quantization. Default is 1.0.

Returns:

The name of the quantized bias.

Return type:

Any

Raises:

ValueError – If the weight quantization type is not supported or if the input name is not found in the quantized value map.

quantize_weight(node: NodeProto, indices: Any, reduce_range: bool = False, op_level_per_channel: bool = False, axis: int = -1, from_subgraph: bool = False) → Any[source]#

Quantizes the weights of a given node.

In some circumstances, a weight is not an initializer. For example, in MatMul, if both A and B are not initializers, B can still be considered as a weight.

This method calls __quantize_inputs to perform the weight quantization.

Parameters:

node (NodeProto) – The node containing the weights to be quantized.
indices (Any) – The indices of the inputs to be quantized.
reduce_range (bool, optional) – Flag to indicate whether to reduce the quantization range. Default is False.
op_level_per_channel (bool, optional) – Flag to indicate whether to use per-channel quantization at the operator level. Default is False.
axis (int, optional) – The axis for per-channel quantization. Default is -1.
from_subgraph (bool, optional) – Flag to indicate whether the node is from a subgraph. Default is False.

Returns:

The result of the weight quantization process.

Return type:

Any

quantize_initializer(weight: Any, qType: Any, method: Any, reduce_range: bool = False, keep_float_weight: bool = False) → tuple[str, str, str][source]#

Parameters:

weight – TensorProto initializer
qType – type to quantize to. Note that it may be different with weight_qType because of mixed precision
keep_float_weight – Whether to quantize the weight. In some cases, we only want to qunatize scale and zero point. If keep_float_weight is False, quantize the weight, or don’t quantize the weight.

Returns:

quantized weight name, zero point name, scale name

quantize_weight_per_channel(weight_name: str, weight_qType: Any, channel_axis: Any, method: Any, reduce_range: bool = True, keep_float_weight: bool = False) → tuple[str, str, str][source]#

Quantizes the given weight tensor per channel.

This method quantizes the weights per channel, creating separate quantization parameters (scale and zero-point) for each channel.

Parameters:

weight_name (str) – The name of the weight tensor to be quantized.
weight_qType (Any) – The data type to use for quantization.
channel_axis (Any) – The axis representing the channel dimension in the weight tensor.
method (Any) – The quantization method to use.
reduce_range (bool, optional) – Whether to reduce the quantization range. Default is True.
keep_float_weight (bool, optional) – Whether to keep the original floating-point weights. Default is False.

Returns:

A tuple containing the names of the quantized weight tensor, zero-point tensor, and scale tensor.

Return type:

Tuple[str, str, str]

Raises:

ValueError – If the specified weight is not an initializer.

calculate_quantization_params() → Any[source]#

Calculates the quantization parameters for each tensor in the model.

This method computes the quantization parameters (scale and zero-point) for each tensor in the model based on its range (rmin and rmax). It adjusts the tensor ranges for the inputs of Clip and Relu nodes and ensures the correct quantization parameters are used for each tensor type.

Returns:: A dictionary containing the quantization parameters for each tensor.
Return type:: Any
Raises:: ValueError – If a weight is not an initializer.

Notes:

If self.tensors_range is None, the method returns immediately.
Adjusts tensor ranges for Clip and Relu nodes.
For versions of ONNX Runtime below 1.16.0, specific quantization parameters are computed.
For versions of ONNX Runtime 1.16.0 and above, the QuantizationParams class is used.
Forces asymmetric quantization for ReLU-like output tensors if self.use_unsigned_relu is True.

ONNX quantizer

Contents

ONNX quantizer#