Quark for ONNX - Tools#

Convert a float16 model to a float32 model#

Since the Quark ONNX tool only supports float32 models quantization currently, converting a model from float16 to float32 is required when quantizing a float16 model.

Use the convert_fp16_to_fp32 tool to convert a float16 model to a float32 model:

python -m pip install onnxsim
python -m quark.onnx.tools.convert_fp16_to_fp32 --input $FLOAT_16_ONNX_MODEL_PATH --output $FLOAT_32_ONNX_MODEL_PATH

Note: When using convert_fp16_to_fp32 in Quark ONNX, it requires onnxsim to simplify the ONNX model. Please make sure that onnxsim is installed by using ‘python -m pip install onnxsim’.

Convert a NCHW input model to a NHWC model#

Given that some models are designed with an input shape of NCHW instead of NHWC, it’s recommended to convert an NCHW input model to NHWC before quantizing a float32 model. Please note that the convsersion steps will be executed even if the model is already NHWC. So please make sure the input model is in NCHW format.

Use the convert_nchw_to_nhwc tool to convert a NCHW model to a NHWC model:

python -m quark.onnx.tools.convert_nchw_to_nhwc --input $NCHW_ONNX_MODEL_PATH --output $NHWC_ONNX_MODEL_PATH

Quantize ONNX model using random input#

Given some ONNX models without input for quantization, use random input for the onnx model quantization process.

Use the random_quantize tool to quantize a onnx model:

python -m quark.onnx.tools.random_quantize --input_model $FLOAT_ONNX_MODEL_PATH --quant_model $QUANTIZED_ONNX_MODEL_PATH

Convert a A8W8 NPU model to a A8W8 CPU model#

Given that some models are quantized by A8W8 NPU, it’s convenient and efficient to convert them to A8W8 CPU models.

Use the convert_a8w8_npu_to_a8w8_cpu tool to convert a A8W8 NPU model to a A8W8 CPU model:

python -m quark.onnx.tools.convert_a8w8_npu_to_a8w8_cpu --input [INPUT_PATH] --output [OUTPUT_PATH]