ONNX quantization utilities

ONNX quantization utilities#

quark.onnx.quant_utils.is_ort_version_below(target_version: str) → bool[source]#

This function checks whether the current version of ONNX Runtime (ORT) is below a specified version.

Args:: target_version (str): The version to compare against the current ORT version.
Returns:: True if the current ORT version is less than the target version, False otherwise.

class quark.onnx.quant_utils.Int16Method(*values)[source]#

class quark.onnx.quant_utils.PowerOfTwoMethod(*values)[source]#

class quark.onnx.quant_utils.ExtendedQuantType(*values)[source]#

class quark.onnx.quant_utils.VitisQuantType(*values)[source]#

class quark.onnx.quant_utils.ExtendedQuantFormat(*values)[source]#

class quark.onnx.quant_utils.VitisQuantFormat(*values)[source]#

quark.onnx.quant_utils.get_qmin_qmax_for_qType(qType: int, reduce_range: bool = False, symmetric: bool = False) → Any[source]#: Return qmin and qmax, the minimum and maximum value representable by the given qType :parameter qType: Integer or Floating Point Type :return: qmin, qmax

quark.onnx.quant_utils.get_qrange_for_qType(qType: int, reduce_range: bool = False, symmetric: bool = False) → Any[source]#

Helper function to get the quantization range for a type.: parameter qType: quantization type. return: quantization range.

class quark.onnx.quant_utils.CachedDataReader(dr: CalibrationDataReader, data_size: int | None = None, convert_nchw_to_nhwc: bool = False, quantize_fp16: bool = False)[source]#

A CalibrationDataReader cached input data from the user provided data reader.

reset_iter() → None[source]#: Recreate the iter so that it can iterate again

get_next() → Dict[str, ndarray[Any, Any]] | None[source]#: Get next feed data :return: feed dict for the model

class quark.onnx.quant_utils.RandomDataReader(model_input: str | Path | ModelProto, input_shape: Dict[str, List[int]] = {}, input_data_range: Dict[str, List[int]] | None = None)[source]#

A CalibrationDataReader using random data for rapid quantiation.

get_next() → Dict[str, ndarray[Any, Any]] | None[source]#: Get next feed data :return: feed dict for the model

class quark.onnx.quant_utils.PathDataReader(model_input: str | Path | ModelProto, data_path: str, input_shape: List[Any] = [])[source]#

A CalibrationDataReader loading data from specified paths for model calibration.

get_next() → Dict[str, ndarray[Any, Any]] | None[source]#: Get next feed data :return: feed dict for the model

quark.onnx.quant_utils.infer_shape(model: ModelProto) → ModelProto[source]#

Parameters:: model – the source model
Returns:: the target model contains inferred shape

quark.onnx.quant_utils.get_datatype_shape(tensor: TensorProto) → Tuple[str, List[Any]][source]#

Parameters:: tensor – the input tensor
Returns:: datatype and shape of the tensor

quark.onnx.quant_utils.dump_model(model_input: str | Path | ModelProto, dump_data_reader: object | None = None, random_data_reader_input_shape: Dict[str, List[int]] = {}, dump_float: bool = False, output_dir: str = './dump_results') → None[source]#

This function dumps the simulation results of the quantized model, including weights and activation results.

Parameters:

model_input (Union[str, Path, onnx.ModelProto]) – path or ModelProto of the input model
dump_data_reader (Optional[object]) – data reader for dumpping. Defaults to None.
random_data_reader_input_shape (Dict[str, List[int]]) – if use internal random data reader, this is used to configure input node’s shape. Defaults to {}.
dump_float (bool) – dump results of the float model or not. Defaults to False.
output_dir (str) – output directory for results. Defaults to './dump_results'.

quark.onnx.quant_utils.is_approximately_equal(a: float, b: float, epsilon: float = 1e-06) → bool[source]#

Parameters:

a – scalar input
b – scalar input
epsilon – difference tolerance

Returns:

equal or not

quark.onnx.quant_utils.check_reduce_mean_condition(model: ModelProto, node: NodeProto) → bool[source]#

Check conditions for Reduce Mean operation in ONNX graph nodes.

Parameters:

model – ONNX model
node – ONNX node

Returns:

True if conditions for Reduce Mean are satisfied, False otherwise

quark.onnx.quant_utils.check_hard_sigmoid_condition(node: NodeProto) → bool[source]#

Parameters:: node – node object
Returns:: hard sigmoid or not

quark.onnx.quant_utils.is_leaky_relu_with_alpha(node: NodeProto, alpha_value: float = 0.1) → bool[source]#

Parameters:

node – node object
alpha_value – DPU supported alpha value

Returns:

the Leaky ReLU node has a approximately alpha or not

quark.onnx.quant_utils.is_clip_with_min_max(model: ModelProto, node: NodeProto, min_value: float = 0.0, max_value: float = 6.0) → bool[source]#

Parameters:

model – model object
node – node object
min_value – supported minimum value of Clip
max_value – supported maximum value of Clip

Returns:

the Clip node has supported min and max value or not

quark.onnx.quant_utils.is_node_needs_annotated(model: ModelProto, node: NodeProto) → bool[source]#

Parameters:

model – model object
node – node object

Returns:

the node needs annotated or not

quark.onnx.quant_utils.get_annotate_tensors(model: ModelProto) → List[str][source]#: Find patterns in the model where qdq needs to be removed, and then return the corresponding tensor names annotate_tensors refers to the tensors associated with the input of the qdq that need to be removed :param model: model object :return: the annotate tensors

quark.onnx.quant_utils.get_qdq_to_remove(model: ModelProto, annotate_tensors: List[str]) → Tuple[List[NodeProto], List[NodeProto], Dict[str, str]][source]#: Return the names of nodes to be removed and a dictionary for converting input tensors :param model: model object :param annotate_tensors: the annotate tensors :return: de-quantize & quantize nodes to remove and node mapping dict

quark.onnx.quant_utils.customqdq_to_contribqdq(model_input: str | Path | ModelProto, use_external_data_format: bool) → Any[source]#: Convert the custom QDQs to the contrib QDQs in the model :param model_input: the model path or model proto :return: None or model proto

quark.onnx.quant_utils.remove_nodes(model: ModelProto, nodes_list: List[Any]) → ModelProto[source]#: Delete nodes according to the nodes in the list :param model: model object :param nodes_list: nodes list to remove :return: the model that has removed some nodes

quark.onnx.quant_utils.remove_initializers(model: ModelProto, init_list: List[str]) → ModelProto[source]#: Delete initializers according to the initializer in the list :param model: model object :param init_list: initializer’s name list to remove :return: the model that has removed some initializers

quark.onnx.quant_utils.modified_annotate_input(model: ModelProto, input_node_mapping: Dict[str, str]) → ModelProto[source]#: Modify the input of ReLU to the output of annotate op, and delete QDQ :param model: model object :param input_node_mapping: input node mapping dict :return: the modified model

quark.onnx.quant_utils.scale2pos(scale: float) → int[source]#: Obtain the fixed-point position corresponding to the scale. To avoid generating infinity during computations, the range of scale is limited. :param scale: the scale :return: the fixed-point position

quark.onnx.quant_utils.pos2scale(pos: int) → float[source]#: Obtain the scale corresponding to the fixed-point position. :param scale: the fixed-point position :return: the scale

quark.onnx.quant_utils.compute_scale_zp(rmin: ndarray[Any, Any], rmax: ndarray[Any, Any], qmin: ndarray[Any, Any], qmax: ndarray[Any, Any], element_type: int, method: PowerOfTwoMethod, symmetric: bool = False, use_pof2s: bool = True) → Any[source]#

Calculate the scale s and zero point z for the quantization relation r = s(q-z), where r are the original values and q are the corresponding quantized values.

r and z are calculated such that every value within [rmin,rmax] has an approximate representation within [qmin,qmax]. In addition, qmin <= z <= qmax is enforced. If the symmetric flag is set to True, the interval [rmin,rmax] is symmetrized to [-absmax, +absmax], where absmax = max(abs(rmin), abs(rmax)).

Parameters:

rmin – minimum value of r
rmax – maximum value of r
qmin – minimum value representable by the target quantization data type
qmax – maximum value representable by the target quantization data type

Returns:

zero and scale [z, s]

quark.onnx.quant_utils.compute_scale_zp_fp(rmin: ndarray[Any, Any], rmax: ndarray[Any, Any], qmin: ndarray[Any, Any], qmax: ndarray[Any, Any], element_type: int, method: CalibrationMethod, symmetric: bool = True, use_scaling: bool = False) → List[Any][source]#

Calculate the scale and zero point for a float type.

Parameters:

rmin – minimum value of r
rmax – maximum value of r
element_type – the element data type of the tensor to quantize

Returns:

zero and scale [z, s]

quark.onnx.quant_utils.dequantize_data(data: ndarray[Any, Any], scale: ndarray[Any, Any], zero_point: ndarray[Any, Any]) → Any[source]#

Parameters:

data – the input data
scale – the scale for quantization
zero_point – the zero point for quantization

Returns:

the de-quantized data

quark.onnx.quant_utils.quantize_data_pof2s(data: ndarray[Any, Any], qType: int, symmetric: bool, reduce_range: bool = False, rmin_real_range: float | None = None, rmin_override: ndarray[Any, Any] | None = None, rmax_override: ndarray[Any, Any] | None = None, method: PowerOfTwoMethod = PowerOfTwoMethod.NonOverflow, pos_range: int = 5, use_pof2s: bool = True, use_scaling: bool = False) → Any[source]#

Parameters:

data – data to quantize
qType – data type to quantize to. Supported types UINT8/16 and INT8/16
symmetric – whether symmetric quantization is used or not. This is applied to INT8/16.

Returns:

minimum, maximum, zero point, scale, and quantized weights

To pack weights, we compute a linear transformation

when data type == uint8 mode, from [rmin, rmax] -> \([0, 2^{b-1}]\) and
when data type == int8, from [-m , m] -> \([-(2^{b-1}-1), 2^{b-1}-1]\) where
m = max(abs(rmin), abs(rmax))

and add necessary intermediate nodes to trasnform quantized weight to full weight using the equation

\(r = S(q-z)\), where

r: real original value
q: quantized value
S: scale
z: zero point

quark.onnx.quant_utils.get_exclude_nodes(input_model: str | Path | ModelProto, input_nodes: List[str] | None, output_nodes: List[str] | None) → List[str][source]#: Return the nodes to be excluded based on the given input and output nodes. :param input_model: the model path or ModelProto :param input_nodes: the nodes to start quantizing :param zero_point: the nodes to terminate quantizing :return: the nodes excluded from quantization

quark.onnx.quant_utils.run_onnx_model(model_input: str | Path | ModelProto, data_reader: Any) → None[source]#: Check if the input ONNX can run successfully :param model_input: the model path or a ModelProto :param data_reader: the data reader for feeding data

quark.onnx.quant_utils.check_onnx_model(model_input: str | Path | ModelProto) → None[source]#: Check if the input ONNX can create InferenceSession successfully :param model_input: the model path or a ModelProto

quark.onnx.quant_utils.check_model_quantizable(model: ModelProto, op_types_to_quantize: List[str] | None, nodes_to_exclude: List[str]) → bool[source]#: Check if the model can be quantized.

quark.onnx.quant_utils.dpu_leaky_relu_alpha(x: float) → float[source]#: This function implements a DPU-specific Leaky ReLU activation with alpha value correction.

quark.onnx.quant_utils.get_clip_min_max(model: ModelProto, clip_node: NodeProto) → Tuple[float | None, float | None, int | None][source]#

Get clip min and max value from Clip node.

Parameters:

model – onnx model instance
clip_node – target Clip node

Returns:

the min, max value and para type The meaning of para type is:

None: unknown.
0: attribute.
1: initializer.
2: other nodes.

quark.onnx.quant_utils.check_relu_like_node(model: ModelProto, node: NodeProto) → bool[source]#: Check if the node is a relu-like node :param model: the model instance :param node: the node to check :return: True if it is

quark.onnx.quant_utils.print_quantize_info(model_input: str | Path | ModelProto, model_output: str | Path | None, calibration_data_reader: CalibrationDataReader | None, calibration_data_path: str | None, quant_format: Any | ExtendedQuantFormat, input_nodes: List[str] | None, output_nodes: List[str] | None, op_types_to_quantize: List[str] | None, extra_op_types_to_quantize: List[str] | None, per_channel: bool, reduce_range: bool, activation_type: Any | ExtendedQuantType, weight_type: Any | ExtendedQuantType, nodes_to_quantize: List[str], nodes_to_exclude: List[str], subgraphs_to_exclude: List[Tuple[List[str]]], optimize_model: bool, use_external_data_format: bool, calibrate_method: Any | PowerOfTwoMethod | Int16Method, execution_providers: List[str] | None, enable_npu_cnn: bool, enable_npu_transformer: bool, specific_tensor_precision: bool, debug_mode: bool, convert_fp16_to_fp32: bool, convert_nchw_to_nhwc: bool, include_cle: bool, include_sq: bool, include_rotation: bool, include_fast_ft: bool, extra_options: Dict[str, Any]) → None[source]#: print os_cpu, time, tool_version, quantized_configuration information.

quark.onnx.quant_utils.print_quantize_dynamic_info(model_input: str | Path | ModelProto, model_output: str | Path | None, op_types_to_quantize: List[str] | None, per_channel: bool, reduce_range: bool, weight_type: Any | ExtendedQuantType, nodes_to_quantize: List[str], nodes_to_exclude: List[str], subgraphs_to_exclude: List[Tuple[List[str]]], use_external_data_format: bool, debug_mode: bool, extra_options: Dict[str, Any]) → None[source]#: print os_cpu, time, tool_version, quantized_configuration information.

quark.onnx.quant_utils.find_int16_scale(x: float) → Tuple[float, float, float][source]#: Given a float value, find the closest value corresponding to M and 2**N, where the range of M and 2**N is within the representation range of int16 and uint16.

quark.onnx.quant_utils.encrypt_data(unencrypted_data: bytes, iv: bytes, key: bytes) → Any[source]#: Encrypt data using AES-256 algorithm. :param unencrypted_data: the original data to be encrypted :param iv: initialization vector, 16 bytes :param key: the key, 32 bytes (256 bits) :return: the encrypted data

quark.onnx.quant_utils.decrypt_data(encrypted_data: bytes, iv: bytes, key: bytes) → Any[source]#: Decrypt data using AES-256 algorithm. :param encrypted_data: the data to be decrypted :param iv: initialization vector, 16 bytes :param key: the key, 32 bytes (256 bits) :return: the decrypted data

quark.onnx.quant_utils.onnx_save_model_with_encryption(model: ModelProto, path: str | Path, secret_key: bytes) → None[source]#: Encrypt model before saving to disk. Only supports <2GB models :param model: the onnx ModelProto to be decrypted :param path: the path for the saving :param secret_key: 48 bytes secret key, 16 bytes for iv and 32 bytes as key

quark.onnx.quant_utils.onnx_load_model_with_decryption(path: str | Path, secret_key: bytes) → ModelProto[source]#: Decrypt model before loading to memory. Only supports <2GB models :param path: the model path :param secret_key: 48 bytes secret key, 16 bytes for iv and 32 bytes as key :return the loaded and decrypted model

quark.onnx.quant_utils.cache_onnx_model_and_infer_shapes(input_model: str | Path | ModelProto, path: str | Path, save_as_external_data: bool = False, secret_key: bytes | None = None) → ModelProto[source]#: Save the model and then load it with shape infer and cryption if secret key provided :param model: the onnx model path or ModelProto to be saved :param path: the path for the saving :param save_as_external_data: save external data for the models >2GB :param secret_key: 48 bytes secret key, 16 bytes for iv and 32 bytes as key :return the model proto

quark.onnx.quant_utils.save_onnx_model_with_external_data(model: ModelProto, path: str | Path, save_as_external_data: bool = False) → None[source]#: Save model to external data, the .data has same name as .onnx :param model: the onnx ModelProto to be saved :param path: the path for the saving :param save_as_external_data: this option is for >2GB ModelProto

quark.onnx.quant_utils.create_infer_session_for_onnx_model(model_input: str | Path | ModelProto, sess_options: SessionOptions | None = None, providers: List[str] | None = ['CPUExecutionProvider'], provider_options: List[Dict[str, str]] | None = None, use_external_data_format: bool = False) → InferenceSession[source]#: Create an Inference Session for onnx model :param model_input: the onnx model, can be a path or ModelProto :param session_options: session options

ONNX quantization utilities

Contents

ONNX quantization utilities#