quark.onnx.qdq_quantizer#

Module Contents#

Classes#

class quark.onnx.qdq_quantizer.QDQQuantizer(model: onnx.ModelProto, per_channel: bool, reduce_range: bool, mode: onnxruntime.quantization.quant_utils.QuantizationMode.QLinearOps, static: bool, weight_qType: Any, activation_qType: Any, tensors_range: Any, nodes_to_quantize: List[str], nodes_to_exclude: List[str], op_types_to_quantize: List[str], extra_options: Any = None)#

A class to perform quantization on an ONNX model using Quantize-Dequantize (QDQ) nodes.

Parameters:
  • model (ModelProto) – The ONNX model to be quantized.

  • per_channel (bool) – Whether to perform per-channel quantization.

  • reduce_range (bool) – Whether to reduce the quantization range.

  • mode (QuantizationMode.QLinearOps) – The quantization mode to be used.

  • static (bool) – Whether to use static quantization.

  • weight_qType (Any) – The quantization type for weights.

  • activation_qType (Any) – The quantization type for activations.

  • tensors_range (Any) – Dictionary specifying the min and max values for tensors.

  • nodes_to_quantize (List[str]) – List of node names to be quantized.

  • nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.

  • op_types_to_quantize (List[str]) – List of operation types to be quantized.

  • extra_options (Any, optional) – Additional options for quantization.

Inherits from:

OrtQDQQuantizer: Base class for ONNX QDQ quantization.

class quark.onnx.qdq_quantizer.QDQNPUTransformerQuantizer(model: onnx.ModelProto, per_channel: bool, reduce_range: bool, mode: onnxruntime.quantization.quant_utils.QuantizationMode.QLinearOps, static: bool, weight_qType: Any, activation_qType: Any, tensors_range: Any, nodes_to_quantize: List[str], nodes_to_exclude: List[str], op_types_to_quantize: List[str], extra_options: Optional[Dict[str, Any]] = None)#

A class to perform quantization on an ONNX model using Quantize-Dequantize (QDQ) nodes optimized for NPU (Neural Processing Unit) Transformers.

Parameters:
  • model (ModelProto) – The ONNX model to be quantized.

  • per_channel (bool) – Whether to perform per-channel quantization.

  • reduce_range (bool) – Whether to reduce the quantization range.

  • mode (QuantizationMode.QLinearOps) – The quantization mode to be used.

  • static (bool) – Whether to use static quantization.

  • weight_qType (Any) – The quantization type for weights.

  • activation_qType (Any) – The quantization type for activations.

  • tensors_range (Any) – Dictionary specifying the min and max values for tensors.

  • nodes_to_quantize (List[str]) – List of node names to be quantized.

  • nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.

  • op_types_to_quantize (List[str]) – List of operation types to be quantized.

  • extra_options (Optional[Dict[str, Any]], optional) – Additional options for quantization.

Inherits from:

QDQQuantizer: Base class for ONNX QDQ quantization.

class quark.onnx.qdq_quantizer.VitisQDQQuantizer(model: onnx.ModelProto, per_channel: bool, reduce_range: bool, mode: onnxruntime.quantization.quant_utils.QuantizationMode.QLinearOps, static: bool, weight_qType: Any, activation_qType: Any, tensors_range: Any, nodes_to_quantize: List[str], nodes_to_exclude: List[str], op_types_to_quantize: List[str], calibrate_method: Any, quantized_tensor_type: Dict[Any, Any] = {}, extra_options: Any = None)#

A class to perform Vitis-specific Quantize-Dequantize (QDQ) quantization on an ONNX model.

Parameters:
  • model (ModelProto) – The ONNX model to be quantized.

  • per_channel (bool) – Whether to perform per-channel quantization.

  • reduce_range (bool) – Whether to reduce the quantization range.

  • mode (QuantizationMode.QLinearOps) – The quantization mode to be used.

  • static (bool) – Whether to use static quantization.

  • weight_qType (Any) – The quantization type for weights.

  • activation_qType (Any) – The quantization type for activations.

  • tensors_range (Any) – Dictionary specifying the min and max values for tensors.

  • nodes_to_quantize (List[str]) – List of node names to be quantized.

  • nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.

  • op_types_to_quantize (List[str]) – List of operation types to be quantized.

  • calibrate_method (Any) – The method used for calibration.

  • quantized_tensor_type (Dict[Any, Any], optional) – Dictionary specifying quantized tensor types.

  • extra_options (Any, optional) – Additional options for quantization.

Inherits from:

VitisONNXQuantizer: Base class for Vitis-specific ONNX quantization.

class quark.onnx.qdq_quantizer.VitisQDQNPUCNNQuantizer(model: onnx.ModelProto, per_channel: bool, reduce_range: bool, mode: onnxruntime.quantization.quant_utils.QuantizationMode.QLinearOps, static: bool, weight_qType: Any, activation_qType: Any, tensors_range: Any, nodes_to_quantize: List[str], nodes_to_exclude: List[str], op_types_to_quantize: List[str], calibrate_method: Any, quantized_tensor_type: Dict[Any, Any] = {}, extra_options: Optional[Dict[str, Any]] = None)#

A class to perform Vitis-specific Quantize-Dequantize (QDQ) quantization for NPU (Neural Processing Unit) on CNN models.

Parameters:
  • model (ModelProto) – The ONNX model to be quantized.

  • per_channel (bool) – Whether to perform per-channel quantization (must be False for NPU).

  • reduce_range (bool) – Whether to reduce the quantization range (must be False for NPU).

  • mode (QuantizationMode.QLinearOps) – The quantization mode to be used.

  • static (bool) – Whether to use static quantization.

  • weight_qType (Any) – The quantization type for weights (must be QuantType.QInt8 for NPU).

  • activation_qType (Any) – The quantization type for activations.

  • tensors_range (Any) – Dictionary specifying the min and max values for tensors.

  • nodes_to_quantize (List[str]) – List of node names to be quantized.

  • nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.

  • op_types_to_quantize (List[str]) – List of operation types to be quantized.

  • calibrate_method (Any) – The method used for calibration.

  • quantized_tensor_type (Dict[Any, Any], optional) – Dictionary specifying quantized tensor types.

  • extra_options (Optional[Dict[str, Any]], optional) – Additional options for quantization.

Inherits from:

VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.

class quark.onnx.qdq_quantizer.VitisExtendedQuantizer(model: onnx.ModelProto, per_channel: bool, reduce_range: bool, mode: onnxruntime.quantization.quant_utils.QuantizationMode.QLinearOps, quant_format: Any, static: bool, weight_qType: Any, activation_qType: Any, tensors_range: Any, nodes_to_quantize: List[str], nodes_to_exclude: List[str], op_types_to_quantize: List[str], calibrate_method: Any, quantized_tensor_type: Dict[Any, Any], extra_options: Optional[Dict[str, Any]] = None)#

A class to perform extended Vitis-specific Quantize-Dequantize (QDQ) quantization.

Parameters:
  • model (ModelProto) – The ONNX model to be quantized.

  • per_channel (bool) – Whether to perform per-channel quantization.

  • reduce_range (bool) – Whether to reduce the quantization range.

  • mode (QuantizationMode.QLinearOps) – The quantization mode to be used.

  • quant_format (Any) – The format for quantization.

  • static (bool) – Whether to use static quantization.

  • weight_qType (Any) – The quantization type for weights.

  • activation_qType (Any) – The quantization type for activations.

  • tensors_range (Any) – Dictionary specifying the min and max values for tensors.

  • nodes_to_quantize (List[str]) – List of node names to be quantized.

  • nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.

  • op_types_to_quantize (List[str]) – List of operation types to be quantized.

  • calibrate_method (Any) – The method used for calibration.

  • quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types.

  • extra_options (Optional[Dict[str, Any]], optional) – Additional options for quantization.

Inherits from:

VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.

class quark.onnx.qdq_quantizer.VitisBFPQuantizer(model: onnx.ModelProto, per_channel: bool, reduce_range: bool, mode: onnxruntime.quantization.quant_utils.QuantizationMode.QLinearOps, static: bool, weight_qType: Any, activation_qType: Any, tensors_range: Any, nodes_to_quantize: List[str], nodes_to_exclude: List[str], op_types_to_quantize: List[str], calibrate_method: Any, quantized_tensor_type: Dict[Any, Any] = {}, extra_options: Optional[Dict[str, Any]] = None)#

A class to perform Vitis-specific Block Floating Point (BFP) Quantization-Dequantization (QDQ) quantization.

Parameters:
  • model (ModelProto) – The ONNX model to be quantized.

  • per_channel (bool) – Whether to perform per-channel quantization.

  • reduce_range (bool) – Whether to reduce the quantization range.

  • mode (QuantizationMode.QLinearOps) – The quantization mode to be used.

  • static (bool) – Whether to use static quantization.

  • weight_qType (Any) – The quantization type for weights.

  • activation_qType (Any) – The quantization type for activations.

  • tensors_range (Any) – Dictionary specifying the min and max values for tensors.

  • nodes_to_quantize (List[str]) – List of node names to be quantized.

  • nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.

  • op_types_to_quantize (List[str]) – List of operation types to be quantized.

  • calibrate_method (Any) – The method used for calibration.

  • quantized_tensor_type (Dict[Any, Any], optional) – Dictionary specifying quantized tensor types.

  • extra_options (Optional[Dict[str, Any]], optional) – Additional options for quantization.

Inherits from:

VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.