quark.onnx.quant_utils
#
Module Contents#
Classes#
Functions#
- quark.onnx.quant_utils.is_ort_version_below(target_version: str) bool #
This function checks whether the current version of ONNX Runtime (ORT) is below a specified version.
- Parameters:
target_version (str) – The version to compare against the current ORT version.
- Returns:
True if the current ORT version is less than the target version, False otherwise.
- class quark.onnx.quant_utils.Int16Method#
Generic enumeration.
Derive from this class to define new enumerations.
- class quark.onnx.quant_utils.PowerOfTwoMethod#
Generic enumeration.
Derive from this class to define new enumerations.
- class quark.onnx.quant_utils.VitisQuantType#
Generic enumeration.
Derive from this class to define new enumerations.
- class quark.onnx.quant_utils.VitisQuantFormat#
Generic enumeration.
Derive from this class to define new enumerations.
- quark.onnx.quant_utils.get_qmin_qmax_for_qType(qType: int, reduce_range: bool = False, symmetric: bool = False) Any #
Return qmin and qmax, the minimum and maximum value representable by the given qType :parameter qType: onnx.onnx_pb.TensorProto.UINT8/16 or onnx.onnx_pb.TensorProto.UINT8/16 :return: qmin, qmax
- quark.onnx.quant_utils.get_qrange_for_qType(qType: int, reduce_range: bool = False, symmetric: bool = False) Any #
- Helper function to get the quantization range for a type.
parameter qType: quantization type. return: quantization range.
- class quark.onnx.quant_utils.CachedDataReader(dr: onnxruntime.quantization.calibrate.CalibrationDataReader, data_size: Optional[int] = None, convert_nchw_to_nhwc: bool = False)#
A CalibrationDataReader cached input data from the user provided data reader.
- reset_iter() None #
Recreate the iter so that it can iterate again
- get_next() Optional[Dict[str, numpy.ndarray[Any, Any]]] #
Get next feed data :return: feed dict for the model
- class quark.onnx.quant_utils.RandomDataReader(model_path: str, input_shape: Union[List[int], Tuple[int], List[List[int]], Dict[str, List[int]], List[Any], Any] = [], input_data_range: Optional[str] = None)#
A CalibrationDataReader using random data for rapid quantiation.
- get_next() Optional[Dict[str, numpy.ndarray[Any, Any]]] #
Get next feed data :return: feed dict for the model
- class quark.onnx.quant_utils.PathDataReader(model_path: str, data_path: str, input_shape: List[Any] = [])#
A CalibrationDataReader loading data from specified paths for model calibration.
- get_next() Optional[Dict[str, numpy.ndarray[Any, Any]]] #
Get next feed data :return: feed dict for the model
- quark.onnx.quant_utils.infer_shape(model: onnx.onnx_ml_pb2.ModelProto) onnx.onnx_ml_pb2.ModelProto #
- Parameters:
model – the source model
- Returns:
the target model contains inferred shape
- quark.onnx.quant_utils.get_datatype_shape(tensor: onnx.onnx_ml_pb2.TensorProto) Tuple[str, List[Any]] #
- Parameters:
tensor – the input tensor
- Returns:
datatype and shape of the tensor
- quark.onnx.quant_utils.dump_model(model: Union[str, onnx.ModelProto], dump_data_reader: Optional[object] = None, random_data_reader_input_shape: List[int] = [], dump_float: bool = False, output_dir: str = './dump_results') None #
This function dumps the simulation results of the quantized model, including weights and activation results. :param model: the input model :param dump_data_reader: data reader for dumpping :param random_data_reader_input_shape: if use internal random data reader,
this is used to configure input node’s shape
- Parameters:
dump_float – dump results of the float model or not
output_dir – output directory for results
- quark.onnx.quant_utils.is_approximately_equal(a: float, b: float, epsilon: float = 1e-06) bool #
- Parameters:
a – scalar input
b – scalar input
epsilon – difference tolerance
- Returns:
equal or not
- quark.onnx.quant_utils.check_reduce_mean_condition(model: onnx.ModelProto, node: onnx.NodeProto) bool #
Check conditions for Reduce Mean operation in ONNX graph nodes.
- Parameters:
model – ONNX model
node – ONNX node
- Returns:
True if conditions for Reduce Mean are satisfied, False otherwise
- quark.onnx.quant_utils.check_hard_sigmoid_condition(node: onnx.NodeProto) bool #
- Parameters:
node – node object
- Returns:
hard sigmoid or not
- quark.onnx.quant_utils.is_leaky_relu_with_alpha(node: onnx.NodeProto, alpha_value: float = 0.1) bool #
- Parameters:
node – node object
alpha_value – DPU supported alpha value
- Returns:
the Leaky ReLU node has a approximately alpha or not
- quark.onnx.quant_utils.is_clip_with_min_max(model: onnx.ModelProto, node: onnx.NodeProto, min_value: float = 0.0, max_value: float = 6.0) bool #
- Parameters:
model – model object
node – node object
min_value – supported minimum value of Clip
max_value – supported maximum value of Clip
- Returns:
the Clip node has supported min and max value or not
- quark.onnx.quant_utils.is_node_needs_annotated(model: onnx.ModelProto, node: onnx.NodeProto) bool #
- Parameters:
model – model object
node – node object
- Returns:
the node needs annotated or not
- quark.onnx.quant_utils.get_annotate_tensors(model: onnx.ModelProto) List[str] #
Find patterns in the model where qdq needs to be removed, and then return the corresponding tensor names annotate_tensors refers to the tensors associated with the input of the qdq that need to be removed :param model: model object :return: the annotate tensors
- quark.onnx.quant_utils.get_qdq_to_remove(model: onnx.ModelProto, relu_input: List[str]) Tuple[List[onnx.NodeProto], List[onnx.NodeProto], Dict[str, str]] #
Return the names of nodes to be removed and a dictionary for converting input tensors :param model: model object :param relu_input: the ReLU node inputs list :return: de-quantize & quantize nodes to remove and node mapping dict
- quark.onnx.quant_utils.customqdq_to_contribqdq(model_path: str, use_external_data_format: bool) None #
Convert the custom QDQs to the contrib QDQs in the model :param model_path: the model path :return: None
- quark.onnx.quant_utils.remove_nodes(model: onnx.ModelProto, nodes_list: List[Any]) onnx.ModelProto #
Delete nodes according to the nodes in the list :param model: model object :param nodes_list: nodes list to remove :return: the model that has removed some nodes
- quark.onnx.quant_utils.remove_initializers(model: onnx.onnx_ml_pb2.ModelProto, init_list: List[str]) onnx.onnx_ml_pb2.ModelProto #
Delete initializers according to the initializer in the list :param model: model object :param init_list: initializer’s name list to remove :return: the model that has removed some initializers
- quark.onnx.quant_utils.modified_annotate_input(model: onnx.onnx_ml_pb2.ModelProto, input_node_mapping: Dict[str, str]) onnx.onnx_ml_pb2.ModelProto #
Modify the input of ReLU to the output of annotate op, and delete QDQ :param model: model object :param input_node_mapping: input node mapping dict :return: the modified model
- quark.onnx.quant_utils.scale2pos(scale: float) int #
Obtain the fixed-point position corresponding to the scale. To avoid generating infinity during computations, the range of scale is limited. :param scale: the scale :return: the fixed-point position
- quark.onnx.quant_utils.pos2scale(pos: int) float #
Obtain the scale corresponding to the fixed-point position. :param scale: the fixed-point position :return: the scale
- quark.onnx.quant_utils.compute_scale_zp(rmin: numpy.ndarray[Any, Any], rmax: numpy.ndarray[Any, Any], qmin: numpy.ndarray[Any, Any], qmax: numpy.ndarray[Any, Any], element_type: int, method: PowerOfTwoMethod, symmetric: bool = False, use_pof2s: bool = True) Any #
Calculate the scale s and zero point z for the quantization relation r = s(q-z), where r are the original values and q are the corresponding quantized values.
r and z are calculated such that every value within [rmin,rmax] has an approximate representation within [qmin,qmax]. In addition, qmin <= z <= qmax is enforced. If the symmetric flag is set to True, the interval [rmin,rmax] is symmetrized to [-absmax, +absmax], where absmax = max(abs(rmin), abs(rmax)).
- Parameters:
rmin – minimum value of r
rmax – maximum value of r
qmin – minimum value representable by the target quantization data type
qmax – maximum value representable by the target quantization data type
- Returns:
zero and scale [z, s] of pof2s
- quark.onnx.quant_utils.compute_scale_zp_fp(rmin: numpy.ndarray[Any, Any], rmax: numpy.ndarray[Any, Any], element_type: int, symmetric: bool = True) List[Any] #
Calculate the scale and zero point for a float type.
- Parameters:
rmin – minimum value of r
rmax – maximum value of r
element_type – the element data type of the tensor to quantize
- Returns:
zero and scale [z, s] of pof2s
- quark.onnx.quant_utils.dequantize_data(data: numpy.ndarray[Any, Any], scale: numpy.ndarray[Any, Any], zero_point: numpy.ndarray[Any, Any]) Any #
- Parameters:
data – the input data
scale – the scale for quantization
zero_point – the zero point for quantization
- Returns:
the de-quantized data
- quark.onnx.quant_utils.quantize_data_pof2s(data: numpy.ndarray[Any, Any], qType: int, symmetric: bool, reduce_range: bool = False, rmin_real_range: Optional[float] = None, rmin_override: Optional[numpy.ndarray[Any, Any]] = None, rmax_override: Optional[numpy.ndarray[Any, Any]] = None, method: PowerOfTwoMethod = PowerOfTwoMethod.NonOverflow, pos_range: int = 5, use_pof2s: bool = True) Any #
- Parameters:
data – data to quantize
qType – data type to quantize to. Supported types UINT8/16 and INT8/16
symmetric – whether symmetric quantization is used or not. This is applied to INT8/16.
- Returns:
minimum, maximum, zero point, scale, and quantized weights
To pack weights, we compute a linear transformation
when data type == uint8 mode, from [rmin, rmax] -> \([0, 2^{b-1}]\) and
- when data type == int8, from [-m , m] -> \([-(2^{b-1}-1), 2^{b-1}-1]\) where
m = max(abs(rmin), abs(rmax))
and add necessary intermediate nodes to trasnform quantized weight to full weight using the equation
\(r = S(q-z)\), where
r: real original value
q: quantized value
S: scale
z: zero point
- quark.onnx.quant_utils.get_exclude_nodes(model_path: str, input_nodes: Union[List[str], None], output_nodes: Union[List[str], None]) List[str] #
Return the nodes to be excluded based on the given input and output nodes. :param model_path: the model path :param input_nodes: the nodes to start quantizing :param zero_point: the nodes to terminate quantizing :return: the nodes excluded from quantization
- quark.onnx.quant_utils.run_onnx_model(model_path: str, data_reader: Any) None #
Check if the input ONNX can run successfully :param model_path: the model path :param data_reader: the data reader for feeding data
- quark.onnx.quant_utils.check_onnx_model(model_path: str) None #
Check if the input ONNX can create InferenceSession successfully :param model_path: the model path
- quark.onnx.quant_utils.check_model_quantizable(model: onnx.onnx_ml_pb2.ModelProto, op_types_to_quantize: Optional[List[str]], nodes_to_exclude: List[str]) bool #
Check if the model can be quantized.
- quark.onnx.quant_utils.dpu_leaky_relu_alpha(x: float) float #
This function implements a DPU-specific Leaky ReLU activation with alpha value correction.
- quark.onnx.quant_utils.get_clip_min_max(model: onnx.onnx_ml_pb2.ModelProto, clip_node: onnx.onnx_ml_pb2.NodeProto) Tuple[Optional[float], Optional[float], Optional[int]] #
Get clip min and max value from Clip node. :param model: onnx model instance :param clip_node: target Clip node :return: the min, max value and para type
The meaning of para type is: None - unknown 0 - attribute 1 - initializer 2 - other nodes
- quark.onnx.quant_utils.check_relu_like_node(model: onnx.onnx_ml_pb2.ModelProto, node: onnx.onnx_ml_pb2.NodeProto) bool #
Check if the node is a relu-like node :param model: the model instance :param node: the node to check :return: True if it is
- quark.onnx.quant_utils.print_quantize_info(model_input: str, model_output: str, calibration_data_reader: str, calibration_data_path: Union[str, None], quant_format: Union[Any, VitisQuantFormat], input_nodes: Union[List[str], None], output_nodes: Union[List[str], None], op_types_to_quantize: Union[List[str], None], random_data_reader_input_shape: Union[List[int], Tuple[int], List[List[int]], Dict[str, List[int]], List[Any], None], per_channel: bool, reduce_range: bool, activation_type: Union[Any, VitisQuantType], weight_type: Union[Any, VitisQuantType], nodes_to_quantize: List[str], nodes_to_exclude: List[str], optimize_model: bool, use_external_data_format: bool, calibrate_method: Union[Any, PowerOfTwoMethod, Int16Method], execution_providers: Union[List[str], None], enable_npu_cnn: bool, enable_npu_transformer: bool, specific_tensor_precision: bool, debug_mode: bool, convert_fp16_to_fp32: bool, convert_nchw_to_nhwc: bool, include_cle: bool, include_sq: bool, include_fast_ft: bool, extra_options: Dict[str, Any]) None #
print os_cpu, time, tool_version, quantized_configuration information.
- quark.onnx.quant_utils.find_int16_scale(x: float) Tuple[float, float, float] #
Given a float value, find the closest value corresponding to M and 2**N, where the range of M and 2**N is within the representation range of int16 and uint16.