quark.onnx.quant_utils#

Module Contents#

Classes#

Functions#

quark.onnx.quant_utils.is_ort_version_below(target_version: str) bool#

This function checks whether the current version of ONNX Runtime (ORT) is below a specified version.

Parameters:

target_version (str) – The version to compare against the current ORT version.

Returns:

True if the current ORT version is less than the target version, False otherwise.

class quark.onnx.quant_utils.Int16Method#

Generic enumeration.

Derive from this class to define new enumerations.

class quark.onnx.quant_utils.PowerOfTwoMethod#

Generic enumeration.

Derive from this class to define new enumerations.

class quark.onnx.quant_utils.VitisQuantType#

Generic enumeration.

Derive from this class to define new enumerations.

class quark.onnx.quant_utils.VitisQuantFormat#

Generic enumeration.

Derive from this class to define new enumerations.

quark.onnx.quant_utils.get_qmin_qmax_for_qType(qType: int, reduce_range: bool = False, symmetric: bool = False) Any#

Return qmin and qmax, the minimum and maximum value representable by the given qType :parameter qType: onnx.onnx_pb.TensorProto.UINT8/16 or onnx.onnx_pb.TensorProto.UINT8/16 :return: qmin, qmax

quark.onnx.quant_utils.get_qrange_for_qType(qType: int, reduce_range: bool = False, symmetric: bool = False) Any#
Helper function to get the quantization range for a type.

parameter qType: quantization type. return: quantization range.

class quark.onnx.quant_utils.CachedDataReader(dr: onnxruntime.quantization.calibrate.CalibrationDataReader, data_size: Optional[int] = None, convert_nchw_to_nhwc: bool = False)#

A CalibrationDataReader cached input data from the user provided data reader.

reset_iter() None#

Recreate the iter so that it can iterate again

get_next() Optional[Dict[str, numpy.ndarray[Any, Any]]]#

Get next feed data :return: feed dict for the model

class quark.onnx.quant_utils.RandomDataReader(model_path: str, input_shape: Union[List[int], Tuple[int], List[List[int]], Dict[str, List[int]], List[Any], Any] = [], input_data_range: Optional[str] = None)#

A CalibrationDataReader using random data for rapid quantiation.

get_next() Optional[Dict[str, numpy.ndarray[Any, Any]]]#

Get next feed data :return: feed dict for the model

class quark.onnx.quant_utils.PathDataReader(model_path: str, data_path: str, input_shape: List[Any] = [])#

A CalibrationDataReader loading data from specified paths for model calibration.

get_next() Optional[Dict[str, numpy.ndarray[Any, Any]]]#

Get next feed data :return: feed dict for the model

quark.onnx.quant_utils.infer_shape(model: onnx.onnx_ml_pb2.ModelProto) onnx.onnx_ml_pb2.ModelProto#
Parameters:

model – the source model

Returns:

the target model contains inferred shape

quark.onnx.quant_utils.get_datatype_shape(tensor: onnx.onnx_ml_pb2.TensorProto) Tuple[str, List[Any]]#
Parameters:

tensor – the input tensor

Returns:

datatype and shape of the tensor

quark.onnx.quant_utils.dump_model(model: Union[str, onnx.ModelProto], dump_data_reader: Optional[object] = None, random_data_reader_input_shape: List[int] = [], dump_float: bool = False, output_dir: str = './dump_results') None#

This function dumps the simulation results of the quantized model, including weights and activation results. :param model: the input model :param dump_data_reader: data reader for dumpping :param random_data_reader_input_shape: if use internal random data reader,

this is used to configure input node’s shape

Parameters:
  • dump_float – dump results of the float model or not

  • output_dir – output directory for results

quark.onnx.quant_utils.is_approximately_equal(a: float, b: float, epsilon: float = 1e-06) bool#
Parameters:
  • a – scalar input

  • b – scalar input

  • epsilon – difference tolerance

Returns:

equal or not

quark.onnx.quant_utils.check_reduce_mean_condition(model: onnx.ModelProto, node: onnx.NodeProto) bool#

Check conditions for Reduce Mean operation in ONNX graph nodes.

Parameters:
  • model – ONNX model

  • node – ONNX node

Returns:

True if conditions for Reduce Mean are satisfied, False otherwise

quark.onnx.quant_utils.check_hard_sigmoid_condition(node: onnx.NodeProto) bool#
Parameters:

node – node object

Returns:

hard sigmoid or not

quark.onnx.quant_utils.is_leaky_relu_with_alpha(node: onnx.NodeProto, alpha_value: float = 0.1) bool#
Parameters:
  • node – node object

  • alpha_value – DPU supported alpha value

Returns:

the Leaky ReLU node has a approximately alpha or not

quark.onnx.quant_utils.is_clip_with_min_max(model: onnx.ModelProto, node: onnx.NodeProto, min_value: float = 0.0, max_value: float = 6.0) bool#
Parameters:
  • model – model object

  • node – node object

  • min_value – supported minimum value of Clip

  • max_value – supported maximum value of Clip

Returns:

the Clip node has supported min and max value or not

quark.onnx.quant_utils.is_node_needs_annotated(model: onnx.ModelProto, node: onnx.NodeProto) bool#
Parameters:
  • model – model object

  • node – node object

Returns:

the node needs annotated or not

quark.onnx.quant_utils.get_annotate_tensors(model: onnx.ModelProto) List[str]#

Find patterns in the model where qdq needs to be removed, and then return the corresponding tensor names annotate_tensors refers to the tensors associated with the input of the qdq that need to be removed :param model: model object :return: the annotate tensors

quark.onnx.quant_utils.get_qdq_to_remove(model: onnx.ModelProto, relu_input: List[str]) Tuple[List[onnx.NodeProto], List[onnx.NodeProto], Dict[str, str]]#

Return the names of nodes to be removed and a dictionary for converting input tensors :param model: model object :param relu_input: the ReLU node inputs list :return: de-quantize & quantize nodes to remove and node mapping dict

quark.onnx.quant_utils.customqdq_to_contribqdq(model_path: str, use_external_data_format: bool) None#

Convert the custom QDQs to the contrib QDQs in the model :param model_path: the model path :return: None

quark.onnx.quant_utils.remove_nodes(model: onnx.ModelProto, nodes_list: List[Any]) onnx.ModelProto#

Delete nodes according to the nodes in the list :param model: model object :param nodes_list: nodes list to remove :return: the model that has removed some nodes

quark.onnx.quant_utils.remove_initializers(model: onnx.onnx_ml_pb2.ModelProto, init_list: List[str]) onnx.onnx_ml_pb2.ModelProto#

Delete initializers according to the initializer in the list :param model: model object :param init_list: initializer’s name list to remove :return: the model that has removed some initializers

quark.onnx.quant_utils.modified_annotate_input(model: onnx.onnx_ml_pb2.ModelProto, input_node_mapping: Dict[str, str]) onnx.onnx_ml_pb2.ModelProto#

Modify the input of ReLU to the output of annotate op, and delete QDQ :param model: model object :param input_node_mapping: input node mapping dict :return: the modified model

quark.onnx.quant_utils.scale2pos(scale: float) int#

Obtain the fixed-point position corresponding to the scale. To avoid generating infinity during computations, the range of scale is limited. :param scale: the scale :return: the fixed-point position

quark.onnx.quant_utils.pos2scale(pos: int) float#

Obtain the scale corresponding to the fixed-point position. :param scale: the fixed-point position :return: the scale

quark.onnx.quant_utils.compute_scale_zp(rmin: numpy.ndarray[Any, Any], rmax: numpy.ndarray[Any, Any], qmin: numpy.ndarray[Any, Any], qmax: numpy.ndarray[Any, Any], element_type: int, method: PowerOfTwoMethod, symmetric: bool = False, use_pof2s: bool = True) Any#

Calculate the scale s and zero point z for the quantization relation r = s(q-z), where r are the original values and q are the corresponding quantized values.

r and z are calculated such that every value within [rmin,rmax] has an approximate representation within [qmin,qmax]. In addition, qmin <= z <= qmax is enforced. If the symmetric flag is set to True, the interval [rmin,rmax] is symmetrized to [-absmax, +absmax], where absmax = max(abs(rmin), abs(rmax)).

Parameters:
  • rmin – minimum value of r

  • rmax – maximum value of r

  • qmin – minimum value representable by the target quantization data type

  • qmax – maximum value representable by the target quantization data type

Returns:

zero and scale [z, s] of pof2s

quark.onnx.quant_utils.compute_scale_zp_fp(rmin: numpy.ndarray[Any, Any], rmax: numpy.ndarray[Any, Any], element_type: int, symmetric: bool = True) List[Any]#

Calculate the scale and zero point for a float type.

Parameters:
  • rmin – minimum value of r

  • rmax – maximum value of r

  • element_type – the element data type of the tensor to quantize

Returns:

zero and scale [z, s] of pof2s

quark.onnx.quant_utils.dequantize_data(data: numpy.ndarray[Any, Any], scale: numpy.ndarray[Any, Any], zero_point: numpy.ndarray[Any, Any]) Any#
Parameters:
  • data – the input data

  • scale – the scale for quantization

  • zero_point – the zero point for quantization

Returns:

the de-quantized data

quark.onnx.quant_utils.quantize_data_pof2s(data: numpy.ndarray[Any, Any], qType: int, symmetric: bool, reduce_range: bool = False, rmin_real_range: Optional[float] = None, rmin_override: Optional[numpy.ndarray[Any, Any]] = None, rmax_override: Optional[numpy.ndarray[Any, Any]] = None, method: PowerOfTwoMethod = PowerOfTwoMethod.NonOverflow, pos_range: int = 5, use_pof2s: bool = True) Any#
Parameters:
  • data – data to quantize

  • qType – data type to quantize to. Supported types UINT8/16 and INT8/16

  • symmetric – whether symmetric quantization is used or not. This is applied to INT8/16.

Returns:

minimum, maximum, zero point, scale, and quantized weights

To pack weights, we compute a linear transformation

  • when data type == uint8 mode, from [rmin, rmax] -> \([0, 2^{b-1}]\) and

  • when data type == int8, from [-m , m] -> \([-(2^{b-1}-1), 2^{b-1}-1]\) where

    m = max(abs(rmin), abs(rmax))

and add necessary intermediate nodes to trasnform quantized weight to full weight using the equation

\(r = S(q-z)\), where

  • r: real original value

  • q: quantized value

  • S: scale

  • z: zero point

quark.onnx.quant_utils.get_exclude_nodes(model_path: str, input_nodes: Union[List[str], None], output_nodes: Union[List[str], None]) List[str]#

Return the nodes to be excluded based on the given input and output nodes. :param model_path: the model path :param input_nodes: the nodes to start quantizing :param zero_point: the nodes to terminate quantizing :return: the nodes excluded from quantization

quark.onnx.quant_utils.run_onnx_model(model_path: str, data_reader: Any) None#

Check if the input ONNX can run successfully :param model_path: the model path :param data_reader: the data reader for feeding data

quark.onnx.quant_utils.check_onnx_model(model_path: str) None#

Check if the input ONNX can create InferenceSession successfully :param model_path: the model path

quark.onnx.quant_utils.check_model_quantizable(model: onnx.onnx_ml_pb2.ModelProto, op_types_to_quantize: Optional[List[str]], nodes_to_exclude: List[str]) bool#

Check if the model can be quantized.

quark.onnx.quant_utils.dpu_leaky_relu_alpha(x: float) float#

This function implements a DPU-specific Leaky ReLU activation with alpha value correction.

quark.onnx.quant_utils.get_clip_min_max(model: onnx.onnx_ml_pb2.ModelProto, clip_node: onnx.onnx_ml_pb2.NodeProto) Tuple[Optional[float], Optional[float], Optional[int]]#

Get clip min and max value from Clip node. :param model: onnx model instance :param clip_node: target Clip node :return: the min, max value and para type

The meaning of para type is: None - unknown 0 - attribute 1 - initializer 2 - other nodes

quark.onnx.quant_utils.check_relu_like_node(model: onnx.onnx_ml_pb2.ModelProto, node: onnx.onnx_ml_pb2.NodeProto) bool#

Check if the node is a relu-like node :param model: the model instance :param node: the node to check :return: True if it is

quark.onnx.quant_utils.print_quantize_info(model_input: str, model_output: str, calibration_data_reader: str, calibration_data_path: Union[str, None], quant_format: Union[Any, VitisQuantFormat], input_nodes: Union[List[str], None], output_nodes: Union[List[str], None], op_types_to_quantize: Union[List[str], None], random_data_reader_input_shape: Union[List[int], Tuple[int], List[List[int]], Dict[str, List[int]], List[Any], None], per_channel: bool, reduce_range: bool, activation_type: Union[Any, VitisQuantType], weight_type: Union[Any, VitisQuantType], nodes_to_quantize: List[str], nodes_to_exclude: List[str], optimize_model: bool, use_external_data_format: bool, calibrate_method: Union[Any, PowerOfTwoMethod, Int16Method], execution_providers: Union[List[str], None], enable_npu_cnn: bool, enable_npu_transformer: bool, specific_tensor_precision: bool, debug_mode: bool, convert_fp16_to_fp32: bool, convert_nchw_to_nhwc: bool, include_cle: bool, include_sq: bool, include_fast_ft: bool, extra_options: Dict[str, Any]) None#

print os_cpu, time, tool_version, quantized_configuration information.

quark.onnx.quant_utils.find_int16_scale(x: float) Tuple[float, float, float]#

Given a float value, find the closest value corresponding to M and 2**N, where the range of M and 2**N is within the representation range of int16 and uint16.