quark.onnx.qdq_quantizer
#
Module Contents#
Classes#
- class quark.onnx.qdq_quantizer.QDQQuantizer(model: onnx.ModelProto, per_channel: bool, reduce_range: bool, mode: onnxruntime.quantization.quant_utils.QuantizationMode.QLinearOps, static: bool, weight_qType: Any, activation_qType: Any, tensors_range: Any, nodes_to_quantize: List[str], nodes_to_exclude: List[str], op_types_to_quantize: List[str], extra_options: Any = None)#
A class to perform quantization on an ONNX model using Quantize-Dequantize (QDQ) nodes.
- Parameters:
model (ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
extra_options (Any, optional) – Additional options for quantization.
- Inherits from:
OrtQDQQuantizer: Base class for ONNX QDQ quantization.
- class quark.onnx.qdq_quantizer.QDQNPUTransformerQuantizer(model: onnx.ModelProto, per_channel: bool, reduce_range: bool, mode: onnxruntime.quantization.quant_utils.QuantizationMode.QLinearOps, static: bool, weight_qType: Any, activation_qType: Any, tensors_range: Any, nodes_to_quantize: List[str], nodes_to_exclude: List[str], op_types_to_quantize: List[str], extra_options: Optional[Dict[str, Any]] = None)#
A class to perform quantization on an ONNX model using Quantize-Dequantize (QDQ) nodes optimized for NPU (Neural Processing Unit) Transformers.
- Parameters:
model (ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
extra_options (Optional[Dict[str, Any]], optional) – Additional options for quantization.
- Inherits from:
QDQQuantizer: Base class for ONNX QDQ quantization.
- class quark.onnx.qdq_quantizer.VitisQDQQuantizer(model: onnx.ModelProto, per_channel: bool, reduce_range: bool, mode: onnxruntime.quantization.quant_utils.QuantizationMode.QLinearOps, static: bool, weight_qType: Any, activation_qType: Any, tensors_range: Any, nodes_to_quantize: List[str], nodes_to_exclude: List[str], op_types_to_quantize: List[str], calibrate_method: Any, quantized_tensor_type: Dict[Any, Any] = {}, extra_options: Any = None)#
A class to perform Vitis-specific Quantize-Dequantize (QDQ) quantization on an ONNX model.
- Parameters:
model (ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
calibrate_method (Any) – The method used for calibration.
quantized_tensor_type (Dict[Any, Any], optional) – Dictionary specifying quantized tensor types.
extra_options (Any, optional) – Additional options for quantization.
- Inherits from:
VitisONNXQuantizer: Base class for Vitis-specific ONNX quantization.
- class quark.onnx.qdq_quantizer.VitisQDQNPUCNNQuantizer(model: onnx.ModelProto, per_channel: bool, reduce_range: bool, mode: onnxruntime.quantization.quant_utils.QuantizationMode.QLinearOps, static: bool, weight_qType: Any, activation_qType: Any, tensors_range: Any, nodes_to_quantize: List[str], nodes_to_exclude: List[str], op_types_to_quantize: List[str], calibrate_method: Any, quantized_tensor_type: Dict[Any, Any] = {}, extra_options: Optional[Dict[str, Any]] = None)#
A class to perform Vitis-specific Quantize-Dequantize (QDQ) quantization for NPU (Neural Processing Unit) on CNN models.
- Parameters:
model (ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization (must be False for NPU).
reduce_range (bool) – Whether to reduce the quantization range (must be False for NPU).
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights (must be QuantType.QInt8 for NPU).
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
calibrate_method (Any) – The method used for calibration.
quantized_tensor_type (Dict[Any, Any], optional) – Dictionary specifying quantized tensor types.
extra_options (Optional[Dict[str, Any]], optional) – Additional options for quantization.
- Inherits from:
VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.
- class quark.onnx.qdq_quantizer.VitisExtendedQuantizer(model: onnx.ModelProto, per_channel: bool, reduce_range: bool, mode: onnxruntime.quantization.quant_utils.QuantizationMode.QLinearOps, quant_format: Any, static: bool, weight_qType: Any, activation_qType: Any, tensors_range: Any, nodes_to_quantize: List[str], nodes_to_exclude: List[str], op_types_to_quantize: List[str], calibrate_method: Any, quantized_tensor_type: Dict[Any, Any], extra_options: Optional[Dict[str, Any]] = None)#
A class to perform extended Vitis-specific Quantize-Dequantize (QDQ) quantization.
- Parameters:
model (ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
quant_format (Any) – The format for quantization.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
calibrate_method (Any) – The method used for calibration.
quantized_tensor_type (Dict[Any, Any]) – Dictionary specifying quantized tensor types.
extra_options (Optional[Dict[str, Any]], optional) – Additional options for quantization.
- Inherits from:
VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.
- class quark.onnx.qdq_quantizer.VitisBFPQuantizer(model: onnx.ModelProto, per_channel: bool, reduce_range: bool, mode: onnxruntime.quantization.quant_utils.QuantizationMode.QLinearOps, static: bool, weight_qType: Any, activation_qType: Any, tensors_range: Any, nodes_to_quantize: List[str], nodes_to_exclude: List[str], op_types_to_quantize: List[str], calibrate_method: Any, quantized_tensor_type: Dict[Any, Any] = {}, extra_options: Optional[Dict[str, Any]] = None)#
A class to perform Vitis-specific Block Floating Point (BFP) Quantization-Dequantization (QDQ) quantization.
- Parameters:
model (ModelProto) – The ONNX model to be quantized.
per_channel (bool) – Whether to perform per-channel quantization.
reduce_range (bool) – Whether to reduce the quantization range.
mode (QuantizationMode.QLinearOps) – The quantization mode to be used.
static (bool) – Whether to use static quantization.
weight_qType (Any) – The quantization type for weights.
activation_qType (Any) – The quantization type for activations.
tensors_range (Any) – Dictionary specifying the min and max values for tensors.
nodes_to_quantize (List[str]) – List of node names to be quantized.
nodes_to_exclude (List[str]) – List of node names to be excluded from quantization.
op_types_to_quantize (List[str]) – List of operation types to be quantized.
calibrate_method (Any) – The method used for calibration.
quantized_tensor_type (Dict[Any, Any], optional) – Dictionary specifying quantized tensor types.
extra_options (Optional[Dict[str, Any]], optional) – Additional options for quantization.
- Inherits from:
VitisQDQQuantizer: Base class for Vitis-specific QDQ quantization.