quark.onnx.optimize#

Module Contents#

Classes#

Functions#

class quark.onnx.optimize.Optimize(model: onnx.ModelProto, op_types_to_quantize: List[str], nodes_to_quantize: List[str] | None, nodes_to_exclude: List[str] | None)#

A class for optimizations to be applied to onnx model before quantization.

Args:

model (onnx.ModelProto): The ONNX model to be optimized. op_types_to_quantize (list): A list of operation types to be quantized. nodes_to_quantize (list): A list of node names to be quantized. nodes_to_exclude (list): A list of node names to be excluded from quantization.

convert_bn_to_conv() None#

Convert BatchNormalization to Conv.

convert_reduce_mean_to_global_avg_pool() None#

Convert ReduceMean to GlobalAveragePool.

split_large_kernel_pool() None#

For pooling with an excessively large kernel size in the onnx model, split it into multiple smaller poolings.

convert_split_to_slice() None#

Convert Split to Slice.

fuse_instance_norm() None#

The split instance norm operation will be fused to InstanceNorm operation

fuse_l2_norm() None#

convert L2norm ops to LpNormalization

fuse_layer_norm() None#

convert LayerNorm ops to single LayerNormalization op

fold_batch_norm() None#

fold BatchNormalization to target operations

convert_clip_to_relu() None#

Convert Clip to Relu.

fold_batch_norm_after_concat() None#

fold BatchNormalization (after concat) to target operations

quark.onnx.optimize.optimize(model: onnx.ModelProto, op_types_to_quantize: List[str], nodes_to_quantize: List[str] | None, nodes_to_exclude: List[str] | None, convert_bn_to_conv: bool = True, convert_reduce_mean_to_global_avg_pool: bool = True, split_large_kernel_pool: bool = True, convert_split_to_slice: bool = True, fuse_instance_norm: bool = True, fuse_l2_norm: bool = True, fuse_layer_norm: bool = True, fold_batch_norm: bool = True, convert_clip_to_relu: bool = True, fold_batch_norm_after_concat: bool = True, dedicate_dq_node: bool = False) onnx.ModelProto#

Optimize an ONNX model to meet specific constraints and requirements for deployment on an CPU/NPU.

This function applies various optimization techniques to the provided ONNX model based on the specified parameters. The optimizations include fusing operations, converting specific layers, and folding batch normalization layers, among others.

Parameters:
  • model (ModelProto) – The ONNX model to be optimized.

  • op_types_to_quantize (List[str]) – List of operation types to be quantized.

  • nodes_to_quantize (Optional[List[str]]) – List of node names to explicitly quantize. If None, quantization is applied based on the operation types.

  • nodes_to_exclude (Optional[List[str]]) – List of node names to exclude from quantization.

  • convert_bn_to_conv (bool) – Flag indicating whether to convert BatchNorm layers to Conv layers.

  • convert_reduce_mean_to_global_avg_pool (bool) – Flag indicating whether to convert ReduceMean layers to GlobalAveragePool layers.

  • split_large_kernel_pool (bool) – Flag indicating whether to split large kernel pooling operations.

  • convert_split_to_slice (bool) – Flag indicating whether to convert Split layers to Slice layers.

  • fuse_instance_norm (bool) – Flag indicating whether to fuse InstanceNorm layers.

  • fuse_l2_norm (bool) – Flag indicating whether to fuse L2Norm layers.

  • fuse_layer_norm (bool) – Flag indicating whether to fuse LayerNorm layers.

  • fold_batch_norm (bool) – Flag indicating whether to fold BatchNorm layers into preceding Conv layers.

  • convert_clip_to_relu (bool) – Flag indicating whether to convert Clip layers to ReLU layers.

  • fold_batch_norm_after_concat (bool) – Flag indicating whether to fold BatchNorm layers after concatenation operations.

Returns:

The optimized ONNX model.

Return type:

ModelProto

Notes:
  • The Optimize class is used to apply the optimizations based on the provided flags.

  • The function returns the optimized model with the applied transformations.