ONNX quantization utilities#
- quark.onnx.quant_utils.is_ort_version_below(target_version: str) bool [source]#
This function checks whether the current version of ONNX Runtime (ORT) is below a specified version.
- Args:
target_version (str): The version to compare against the current ORT version.
- Returns:
True if the current ORT version is less than the target version, False otherwise.
- quark.onnx.quant_utils.get_qmin_qmax_for_qType(qType: int, reduce_range: bool = False, symmetric: bool = False) Any [source]#
Return qmin and qmax, the minimum and maximum value representable by the given qType :parameter qType: Integer or Floating Point Type :return: qmin, qmax
- quark.onnx.quant_utils.get_qrange_for_qType(qType: int, reduce_range: bool = False, symmetric: bool = False) Any [source]#
- Helper function to get the quantization range for a type.
parameter qType: quantization type. return: quantization range.
- class quark.onnx.quant_utils.CachedDataReader(dr: CalibrationDataReader, data_size: int | None = None, convert_nchw_to_nhwc: bool = False, quantize_fp16: bool = False)[source]#
A CalibrationDataReader cached input data from the user provided data reader.
- class quark.onnx.quant_utils.RandomDataReader(model_input: str | Path | ModelProto, input_shape: Dict[str, List[int]] = {}, input_data_range: Dict[str, List[int]] | None = None)[source]#
A CalibrationDataReader using random data for rapid quantiation.
- class quark.onnx.quant_utils.PathDataReader(model_input: str | Path | ModelProto, data_path: str, input_shape: List[Any] = [])[source]#
A CalibrationDataReader loading data from specified paths for model calibration.
- quark.onnx.quant_utils.infer_shape(model: ModelProto) ModelProto [source]#
- Parameters:
model – the source model
- Returns:
the target model contains inferred shape
- quark.onnx.quant_utils.get_datatype_shape(tensor: TensorProto) Tuple[str, List[Any]] [source]#
- Parameters:
tensor – the input tensor
- Returns:
datatype and shape of the tensor
- quark.onnx.quant_utils.dump_model(model_input: str | Path | ModelProto, dump_data_reader: object | None = None, random_data_reader_input_shape: Dict[str, List[int]] = {}, dump_float: bool = False, output_dir: str = './dump_results') None [source]#
This function dumps the simulation results of the quantized model, including weights and activation results.
- Parameters:
model_input (Union[str, Path, onnx.ModelProto]) – path or ModelProto of the input model
dump_data_reader (Optional[object]) – data reader for dumpping. Defaults to
None
.random_data_reader_input_shape (Dict[str, List[int]]) – if use internal random data reader, this is used to configure input node’s shape. Defaults to
{}
.dump_float (bool) – dump results of the float model or not. Defaults to
False
.output_dir (str) – output directory for results. Defaults to
'./dump_results'
.
- quark.onnx.quant_utils.is_approximately_equal(a: float, b: float, epsilon: float = 1e-06) bool [source]#
- Parameters:
a – scalar input
b – scalar input
epsilon – difference tolerance
- Returns:
equal or not
- quark.onnx.quant_utils.check_reduce_mean_condition(model: ModelProto, node: NodeProto) bool [source]#
Check conditions for Reduce Mean operation in ONNX graph nodes.
- Parameters:
model – ONNX model
node – ONNX node
- Returns:
True if conditions for Reduce Mean are satisfied, False otherwise
- quark.onnx.quant_utils.check_hard_sigmoid_condition(node: NodeProto) bool [source]#
- Parameters:
node – node object
- Returns:
hard sigmoid or not
- quark.onnx.quant_utils.is_leaky_relu_with_alpha(node: NodeProto, alpha_value: float = 0.1) bool [source]#
- Parameters:
node – node object
alpha_value – DPU supported alpha value
- Returns:
the Leaky ReLU node has a approximately alpha or not
- quark.onnx.quant_utils.is_clip_with_min_max(model: ModelProto, node: NodeProto, min_value: float = 0.0, max_value: float = 6.0) bool [source]#
- Parameters:
model – model object
node – node object
min_value – supported minimum value of Clip
max_value – supported maximum value of Clip
- Returns:
the Clip node has supported min and max value or not
- quark.onnx.quant_utils.is_node_needs_annotated(model: ModelProto, node: NodeProto) bool [source]#
- Parameters:
model – model object
node – node object
- Returns:
the node needs annotated or not
- quark.onnx.quant_utils.get_annotate_tensors(model: ModelProto) List[str] [source]#
Find patterns in the model where qdq needs to be removed, and then return the corresponding tensor names annotate_tensors refers to the tensors associated with the input of the qdq that need to be removed :param model: model object :return: the annotate tensors
- quark.onnx.quant_utils.get_qdq_to_remove(model: ModelProto, annotate_tensors: List[str]) Tuple[List[NodeProto], List[NodeProto], Dict[str, str]] [source]#
Return the names of nodes to be removed and a dictionary for converting input tensors :param model: model object :param annotate_tensors: the annotate tensors :return: de-quantize & quantize nodes to remove and node mapping dict
- quark.onnx.quant_utils.customqdq_to_contribqdq(model_input: str | Path | ModelProto, use_external_data_format: bool) Any [source]#
Convert the custom QDQs to the contrib QDQs in the model :param model_input: the model path or model proto :return: None or model proto
- quark.onnx.quant_utils.remove_nodes(model: ModelProto, nodes_list: List[Any]) ModelProto [source]#
Delete nodes according to the nodes in the list :param model: model object :param nodes_list: nodes list to remove :return: the model that has removed some nodes
- quark.onnx.quant_utils.remove_initializers(model: ModelProto, init_list: List[str]) ModelProto [source]#
Delete initializers according to the initializer in the list :param model: model object :param init_list: initializer’s name list to remove :return: the model that has removed some initializers
- quark.onnx.quant_utils.modified_annotate_input(model: ModelProto, input_node_mapping: Dict[str, str]) ModelProto [source]#
Modify the input of ReLU to the output of annotate op, and delete QDQ :param model: model object :param input_node_mapping: input node mapping dict :return: the modified model
- quark.onnx.quant_utils.scale2pos(scale: float) int [source]#
Obtain the fixed-point position corresponding to the scale. To avoid generating infinity during computations, the range of scale is limited. :param scale: the scale :return: the fixed-point position
- quark.onnx.quant_utils.pos2scale(pos: int) float [source]#
Obtain the scale corresponding to the fixed-point position. :param scale: the fixed-point position :return: the scale
- quark.onnx.quant_utils.compute_scale_zp(rmin: ndarray[Any, Any], rmax: ndarray[Any, Any], qmin: ndarray[Any, Any], qmax: ndarray[Any, Any], element_type: int, method: PowerOfTwoMethod, symmetric: bool = False, use_pof2s: bool = True) Any [source]#
Calculate the scale s and zero point z for the quantization relation r = s(q-z), where r are the original values and q are the corresponding quantized values.
r and z are calculated such that every value within [rmin,rmax] has an approximate representation within [qmin,qmax]. In addition, qmin <= z <= qmax is enforced. If the symmetric flag is set to True, the interval [rmin,rmax] is symmetrized to [-absmax, +absmax], where absmax = max(abs(rmin), abs(rmax)).
- Parameters:
rmin – minimum value of r
rmax – maximum value of r
qmin – minimum value representable by the target quantization data type
qmax – maximum value representable by the target quantization data type
- Returns:
zero and scale [z, s]
- quark.onnx.quant_utils.compute_scale_zp_fp(rmin: ndarray[Any, Any], rmax: ndarray[Any, Any], qmin: ndarray[Any, Any], qmax: ndarray[Any, Any], element_type: int, method: CalibrationMethod, symmetric: bool = True, use_scaling: bool = False) List[Any] [source]#
Calculate the scale and zero point for a float type.
- Parameters:
rmin – minimum value of r
rmax – maximum value of r
element_type – the element data type of the tensor to quantize
- Returns:
zero and scale [z, s]
- quark.onnx.quant_utils.dequantize_data(data: ndarray[Any, Any], scale: ndarray[Any, Any], zero_point: ndarray[Any, Any]) Any [source]#
- Parameters:
data – the input data
scale – the scale for quantization
zero_point – the zero point for quantization
- Returns:
the de-quantized data
- quark.onnx.quant_utils.quantize_data_pof2s(data: ndarray[Any, Any], qType: int, symmetric: bool, reduce_range: bool = False, rmin_real_range: float | None = None, rmin_override: ndarray[Any, Any] | None = None, rmax_override: ndarray[Any, Any] | None = None, method: PowerOfTwoMethod = PowerOfTwoMethod.NonOverflow, pos_range: int = 5, use_pof2s: bool = True, use_scaling: bool = False) Any [source]#
- Parameters:
data – data to quantize
qType – data type to quantize to. Supported types UINT8/16 and INT8/16
symmetric – whether symmetric quantization is used or not. This is applied to INT8/16.
- Returns:
minimum, maximum, zero point, scale, and quantized weights
To pack weights, we compute a linear transformation
when data type == uint8 mode, from [rmin, rmax] -> \([0, 2^{b-1}]\) and
- when data type == int8, from [-m , m] -> \([-(2^{b-1}-1), 2^{b-1}-1]\) where
m = max(abs(rmin), abs(rmax))
and add necessary intermediate nodes to trasnform quantized weight to full weight using the equation
\(r = S(q-z)\), where
r: real original value
q: quantized value
S: scale
z: zero point
- quark.onnx.quant_utils.get_exclude_nodes(input_model: str | Path | ModelProto, input_nodes: List[str] | None, output_nodes: List[str] | None) List[str] [source]#
Return the nodes to be excluded based on the given input and output nodes. :param input_model: the model path or ModelProto :param input_nodes: the nodes to start quantizing :param zero_point: the nodes to terminate quantizing :return: the nodes excluded from quantization
- quark.onnx.quant_utils.run_onnx_model(model_input: str | Path | ModelProto, data_reader: Any) None [source]#
Check if the input ONNX can run successfully :param model_input: the model path or a ModelProto :param data_reader: the data reader for feeding data
- quark.onnx.quant_utils.check_onnx_model(model_input: str | Path | ModelProto) None [source]#
Check if the input ONNX can create InferenceSession successfully :param model_input: the model path or a ModelProto
- quark.onnx.quant_utils.check_model_quantizable(model: ModelProto, op_types_to_quantize: List[str] | None, nodes_to_exclude: List[str]) bool [source]#
Check if the model can be quantized.
- quark.onnx.quant_utils.dpu_leaky_relu_alpha(x: float) float [source]#
This function implements a DPU-specific Leaky ReLU activation with alpha value correction.
- quark.onnx.quant_utils.get_clip_min_max(model: ModelProto, clip_node: NodeProto) Tuple[float | None, float | None, int | None] [source]#
Get clip min and max value from Clip node.
- Parameters:
model – onnx model instance
clip_node – target Clip node
- Returns:
the min, max value and para type The meaning of para type is:
None
: unknown.0
: attribute.1
: initializer.2
: other nodes.
- quark.onnx.quant_utils.check_relu_like_node(model: ModelProto, node: NodeProto) bool [source]#
Check if the node is a relu-like node :param model: the model instance :param node: the node to check :return: True if it is
- quark.onnx.quant_utils.print_quantize_info(model_input: str | Path | ModelProto, model_output: str | Path | None, calibration_data_reader: CalibrationDataReader | None, calibration_data_path: str | None, quant_format: Any | ExtendedQuantFormat, input_nodes: List[str] | None, output_nodes: List[str] | None, op_types_to_quantize: List[str] | None, extra_op_types_to_quantize: List[str] | None, per_channel: bool, reduce_range: bool, activation_type: Any | ExtendedQuantType, weight_type: Any | ExtendedQuantType, nodes_to_quantize: List[str], nodes_to_exclude: List[str], subgraphs_to_exclude: List[Tuple[List[str]]], optimize_model: bool, use_external_data_format: bool, calibrate_method: Any | PowerOfTwoMethod | Int16Method, execution_providers: List[str] | None, enable_npu_cnn: bool, enable_npu_transformer: bool, specific_tensor_precision: bool, debug_mode: bool, convert_fp16_to_fp32: bool, convert_nchw_to_nhwc: bool, include_cle: bool, include_sq: bool, include_rotation: bool, include_fast_ft: bool, extra_options: Dict[str, Any]) None [source]#
print os_cpu, time, tool_version, quantized_configuration information.
- quark.onnx.quant_utils.print_quantize_dynamic_info(model_input: str | Path | ModelProto, model_output: str | Path | None, op_types_to_quantize: List[str] | None, per_channel: bool, reduce_range: bool, weight_type: Any | ExtendedQuantType, nodes_to_quantize: List[str], nodes_to_exclude: List[str], subgraphs_to_exclude: List[Tuple[List[str]]], use_external_data_format: bool, debug_mode: bool, extra_options: Dict[str, Any]) None [source]#
print os_cpu, time, tool_version, quantized_configuration information.
- quark.onnx.quant_utils.find_int16_scale(x: float) Tuple[float, float, float] [source]#
Given a float value, find the closest value corresponding to M and 2**N, where the range of M and 2**N is within the representation range of int16 and uint16.
- quark.onnx.quant_utils.encrypt_data(unencrypted_data: bytes, iv: bytes, key: bytes) Any [source]#
Encrypt data using AES-256 algorithm. :param unencrypted_data: the original data to be encrypted :param iv: initialization vector, 16 bytes :param key: the key, 32 bytes (256 bits) :return: the encrypted data
- quark.onnx.quant_utils.decrypt_data(encrypted_data: bytes, iv: bytes, key: bytes) Any [source]#
Decrypt data using AES-256 algorithm. :param encrypted_data: the data to be decrypted :param iv: initialization vector, 16 bytes :param key: the key, 32 bytes (256 bits) :return: the decrypted data
- quark.onnx.quant_utils.onnx_save_model_with_encryption(model: ModelProto, path: str | Path, secret_key: bytes) None [source]#
Encrypt model before saving to disk. Only supports <2GB models :param model: the onnx ModelProto to be decrypted :param path: the path for the saving :param secret_key: 48 bytes secret key, 16 bytes for iv and 32 bytes as key
- quark.onnx.quant_utils.onnx_load_model_with_decryption(path: str | Path, secret_key: bytes) ModelProto [source]#
Decrypt model before loading to memory. Only supports <2GB models :param path: the model path :param secret_key: 48 bytes secret key, 16 bytes for iv and 32 bytes as key :return the loaded and decrypted model
- quark.onnx.quant_utils.cache_onnx_model_and_infer_shapes(input_model: str | Path | ModelProto, path: str | Path, save_as_external_data: bool = False, secret_key: bytes | None = None) ModelProto [source]#
Save the model and then load it with shape infer and cryption if secret key provided :param model: the onnx model path or ModelProto to be saved :param path: the path for the saving :param save_as_external_data: save external data for the models >2GB :param secret_key: 48 bytes secret key, 16 bytes for iv and 32 bytes as key :return the model proto
- quark.onnx.quant_utils.save_onnx_model_with_external_data(model: ModelProto, path: str | Path, save_as_external_data: bool = False) None [source]#
Save model to external data, the .data has same name as .onnx :param model: the onnx ModelProto to be saved :param path: the path for the saving :param save_as_external_data: this option is for >2GB ModelProto
- quark.onnx.quant_utils.create_infer_session_for_onnx_model(model_input: str | Path | ModelProto, sess_options: SessionOptions | None = None, providers: List[str] | None = ['CPUExecutionProvider'], provider_options: List[Dict[str, str]] | None = None, use_external_data_format: bool = False) InferenceSession [source]#
Create an Inference Session for onnx model :param model_input: the onnx model, can be a path or ModelProto :param session_options: session options