Tools#

Convert a Float32 Model to a BFloat16 Model#

Because of the increasing demands for BFloat16 deployment, a conversion tool is necessary to convert a Float32 model to a BFloat16 model. Four BFloat16 implementation formats are provided: vitisqdq, with_cast, simulate_bf16, and bf16.

  • vitisqdq: Implements BFloat16 conversion by inserting VitisQDQ of BFloat16.

  • with_cast: Implements BFloat16 conversion by inserting Cast operations to convert from Float32 to BFloat16.

  • simulate_bf16: Implements BFloat16 conversion by storing all BFloat16 weights in float format.

  • bf16: Implements BFloat16 conversion by directly converting the Float32 model to BFloat16, with only the input and output remaining as float.

The default value is with_cast.

Use the convert_fp32_to_bf16 tool to convert a Float32 model to a BFloat16 model:

python -m quark.onnx.tools.convert_fp32_to_bf16 --input $FLOAT_32_ONNX_MODEL_PATH --output $BFLOAT_16_ONNX_MODEL_PATH --format $BFLOAT_FORMAT

Convert a Float16 Model to a BFloat16 Model#

Because of increasing demands for BFloat16 deployment, a conversion tool is necessary to convert a Float16 model to a BFloat16 model. Four BFloat16 implementation formats are provided: vitisqdq, with_cast, simulate_bf16, and bf16.

  • vitisqdq: Implements BFloat16 conversion by inserting VitisQDQ of BFloat16.

  • with_cast: Implements BFloat16 conversion by inserting Cast operations to convert from Float16 to BFloat16.

  • simulate_bf16: Implements BFloat16 conversion by storing all BFloat16 weights in float format.

  • bf16: Implements BFloat16 conversion by directly converting the Float16 model to BFloat16, with only the input and output remaining as Float16.

The default value is with_cast.

Use the convert_fp16_to_bf16 tool to convert a Float16 model to a BFloat16 model:

python -m quark.onnx.tools.convert_fp16_to_bf16 --input $FLOAT_16_ONNX_MODEL_PATH --output $BFLOAT_16_ONNX_MODEL_PATH --format $BFLOAT_FORMAT

Convert a Float16 Model to a Float32 Model#

Because the AMD Quark ONNX tool only supports Float32 models quantization currently, converting a model from Float16 to Float32 is required when quantizing a Float16 model.

Use the convert_fp16_to_fp32 tool to convert a Float16 model to a Float32 model:

python -m pip install onnxsim
python -m quark.onnx.tools.convert_fp16_to_fp32 --input $FLOAT_16_ONNX_MODEL_PATH --output $FLOAT_32_ONNX_MODEL_PATH

Note

When using the convert_fp16_to_fp32 tool in Quark ONNX, onnxsim is required to simplify the ONNX model. Ensure that onnxsim is installed by running python -m pip install onnxsim

Convert a Float32 Model to a BFP16 Model#

Since there are more and more BFP16 deployment demands, we need a conversion tool to directly convert a Float32 model to a BFP16 model.

Use the convert_fp32_to_bfp16 tool to convert a Float32 model to a BFP16 model:

python -m quark.onnx.tools.convert_fp32_to_bfp16 --input $FLOAT_32_ONNX_MODEL_PATH --output $BFP_16_ONNX_MODEL_PATH

Convert a Float16 Model to a BFP16 Model#

Because there are more and more BFP16 deployment demands, we need a conversion tool to directly convert a Float16 model to a BFP16 model.

Use the convert_fp16_to_bfp16 tool to convert a Float16 model to a BFP16 model:

python -m quark.onnx.tools.convert_fp16_to_bfp16 --input $FLOAT_16_ONNX_MODEL_PATH --output $BFP_16_ONNX_MODEL_PATH

Convert a NCHW input Model to a NHWC Model#

Given that some models are designed with an input shape of NCHW instead of NHWC, it is recommended to convert an NCHW input model to NHWC before quantizing a Float32 model. The conversion steps execute even if the model is already NHWC. Therefore, ensure the input model is in NCHW format.

Note

The data layout, whether NCHW or NHWC, does not influence the quantization process itself. However, deployment efficiency is affected by the kernel design, which is often optimized for NHWC. Consequently, when input data is in NCHW format, a conversion to NHWC is recommended. This conversion introduces a small computational overhead, though the overall performance benefits from the optimized layout. While a transpose operation is required for the format change, the total number of other operations remains constant.

Use the convert_nchw_to_nhwc tool to convert an NCHW model to an NHWC model:

python -m quark.onnx.tools.convert_nchw_to_nhwc --input $NCHW_ONNX_MODEL_PATH --output $NHWC_ONNX_MODEL_PATH

Quantize a ONNX Model Using Random Input#

For some ONNX models without an input for quantization, use random input for the ONNX model quantization process.

Use the random_quantize tool to quantize an ONNX model:

python -m quark.onnx.tools.random_quantize --input_model $FLOAT_ONNX_MODEL_PATH --quant_model $QUANTIZED_ONNX_MODEL_PATH

Convert a A8W8 NPU Model to a A8W8 CPU Model#

Given that some models are quantized by A8W8 NPU, it is convenient and efficient to convert them to A8W8 CPU models.

Use the convert_a8w8_npu_to_a8w8_cpu tool to convert a A8W8 NPU model to a A8W8 CPU model:

python -m quark.onnx.tools.convert_a8w8_npu_to_a8w8_cpu --input [INPUT_PATH] --output [OUTPUT_PATH]

Convert a U16U8 Quantized Model to a U8U8 Model#

Convert a U16U8 (activations are quantized by UINT16 and weights by UINT8) to a U8U8 model without calibration.

Use the convert_u16u8_to_u8u8 tool to do the conversion:

python -m quark.onnx.tools.convert_u16u8_to_u8u8 --input [INPUT_PATH] --output [OUTPUT_PATH]

Evaluate accuracy between baseline and quantized results folders#

We often need to compare the differences in output images before and after quantization. Currently, we support four metrics: cosine similarity, L2 loss, PSNR, and VMAF, as well as three formats: JPG, PNG, and NPY.

Use the evaluate tool:

python -m quark.onnx.tools.evaluate.py --baseline_results_folder [BASELINE_RESULTS_FOLDER_PATH] --quantized_results_folder [QUANTIZED_RESULTS_FOLDER_PATH]

Replace inf and -inf Values in ONNX Model Weights#

Replace inf or -inf values in ONNX model weights using the replace_inf_weights tool with a specified value.

Use the replace_inf_weights tool to do the conversion:

python -m quark.onnx.tools.replace_inf_weights --input_model [INPUT_MODEL_PATH] --output_model [OUTPUT_MODEL_PATH] --replace_inf_value [REPLACE_INF_VALUE]

Note

The default replacement value is 10000.0. This might lead to precision degradation. Adjust the replacement value based on your model and application needs.