Tools#
Convert a Float32 Model to a BFloat16 Model#
Because of the increasing demands for BFloat16 deployment, a conversion tool is necessary to convert a Float32 model to a BFloat16 model. Four BFloat16 implementation formats are provided: vitisqdq, with_cast, simulate_bf16, and bf16.
vitisqdq: Implements BFloat16 conversion by inserting VitisQDQ of BFloat16.
with_cast: Implements BFloat16 conversion by inserting Cast operations to convert from Float32 to BFloat16.
simulate_bf16: Implements BFloat16 conversion by storing all BFloat16 weights in float format.
bf16: Implements BFloat16 conversion by directly converting the Float32 model to BFloat16, with only the input and output remaining as float.
The default value is with_cast.
Use the convert_fp32_to_bf16
tool to convert a Float32 model to a BFloat16 model:
python -m quark.onnx.tools.convert_fp32_to_bf16 --input $FLOAT_32_ONNX_MODEL_PATH --output $BFLOAT_16_ONNX_MODEL_PATH --format $BFLOAT_FORMAT
Convert a Float16 Model to a BFloat16 Model#
Because of increasing demands for BFloat16 deployment, a conversion tool is necessary to convert a Float16 model to a BFloat16 model. Four BFloat16 implementation formats are provided: vitisqdq, with_cast, simulate_bf16, and bf16.
vitisqdq: Implements BFloat16 conversion by inserting VitisQDQ of BFloat16.
with_cast: Implements BFloat16 conversion by inserting Cast operations to convert from Float16 to BFloat16.
simulate_bf16: Implements BFloat16 conversion by storing all BFloat16 weights in float format.
bf16: Implements BFloat16 conversion by directly converting the Float16 model to BFloat16, with only the input and output remaining as Float16.
The default value is with_cast.
Use the convert_fp16_to_bf16
tool to convert a Float16 model to a BFloat16 model:
python -m quark.onnx.tools.convert_fp16_to_bf16 --input $FLOAT_16_ONNX_MODEL_PATH --output $BFLOAT_16_ONNX_MODEL_PATH --format $BFLOAT_FORMAT
Convert a Float16 Model to a Float32 Model#
Because the AMD Quark ONNX tool only supports Float32 models quantization currently, converting a model from Float16 to Float32 is required when quantizing a Float16 model.
Use the convert_fp16_to_fp32 tool
to convert a Float16 model to a
Float32 model:
python -m pip install onnxsim
python -m quark.onnx.tools.convert_fp16_to_fp32 --input $FLOAT_16_ONNX_MODEL_PATH --output $FLOAT_32_ONNX_MODEL_PATH
Note
When using the convert_fp16_to_fp32
tool in Quark ONNX, onnxsim is required to simplify the ONNX model. Ensure that onnxsim is installed by running python -m pip install onnxsim
Convert a Float32 Model to a BFP16 Model#
Since there are more and more BFP16 deployment demands, we need a conversion tool to directly convert a Float32 model to a BFP16 model.
Use the convert_fp32_to_bfp16
tool to convert a Float32 model to a BFP16 model:
python -m quark.onnx.tools.convert_fp32_to_bfp16 --input $FLOAT_32_ONNX_MODEL_PATH --output $BFP_16_ONNX_MODEL_PATH
Convert a Float16 Model to a BFP16 Model#
Because there are more and more BFP16 deployment demands, we need a conversion tool to directly convert a Float16 model to a BFP16 model.
Use the convert_fp16_to_bfp16
tool to convert a Float16 model to a BFP16 model:
python -m quark.onnx.tools.convert_fp16_to_bfp16 --input $FLOAT_16_ONNX_MODEL_PATH --output $BFP_16_ONNX_MODEL_PATH
Convert a NCHW input Model to a NHWC Model#
Given that some models are designed with an input shape of NCHW instead of NHWC, it is recommended to convert an NCHW input model to NHWC before quantizing a Float32 model. The conversion steps execute even if the model is already NHWC. Therefore, ensure the input model is in NCHW format.
Note
The data layout, whether NCHW or NHWC, does not influence the quantization process itself. However, deployment efficiency is affected by the kernel design, which is often optimized for NHWC. Consequently, when input data is in NCHW format, a conversion to NHWC is recommended. This conversion introduces a small computational overhead, though the overall performance benefits from the optimized layout. While a transpose operation is required for the format change, the total number of other operations remains constant.
Use the convert_nchw_to_nhwc
tool to convert an NCHW model to an NHWC model:
python -m quark.onnx.tools.convert_nchw_to_nhwc --input $NCHW_ONNX_MODEL_PATH --output $NHWC_ONNX_MODEL_PATH
Quantize a ONNX Model Using Random Input#
For some ONNX models without an input for quantization, use random input for the ONNX model quantization process.
Use the random_quantize
tool to quantize an ONNX model:
python -m quark.onnx.tools.random_quantize --input_model $FLOAT_ONNX_MODEL_PATH --quant_model $QUANTIZED_ONNX_MODEL_PATH
Convert a A8W8 NPU Model to a A8W8 CPU Model#
Given that some models are quantized by A8W8 NPU, it is convenient and efficient to convert them to A8W8 CPU models.
Use the convert_a8w8_npu_to_a8w8_cpu
tool to convert a A8W8 NPU model to a A8W8 CPU model:
python -m quark.onnx.tools.convert_a8w8_npu_to_a8w8_cpu --input [INPUT_PATH] --output [OUTPUT_PATH]
Print Names and Quantity of A16W8 and A8W8 Conv for Mixed-Precision Models#
For some models that are mixed precision such as A18W8 and A8W8 mixed, use the print_a16w8_a8w8_nodes
tool to print names and quantity of A16W8 and A8W8 Conv, ConvTranspose, Gemm, and MatMul. The MatMul node must have one and only one set of weights.
python -m quark.onnx.tools.print_a16w8_a8w8_nodes --input [INPUT_PATH]
Convert a U16U8 Quantized Model to a U8U8 Model#
Convert a U16U8 (activations are quantized by UINT16 and weights by UINT8) to a U8U8 model without calibration.
Use the convert_u16u8_to_u8u8
tool to do the conversion:
python -m quark.onnx.tools.convert_u16u8_to_u8u8 --input [INPUT_PATH] --output [OUTPUT_PATH]
Evaluate accuracy between baseline and quantized results folders#
We often need to compare the differences in output images before and after quantization. Currently, we support four metrics: cosine similarity, L2 loss, PSNR, and VMAF, as well as three formats: JPG, PNG, and NPY.
Use the evaluate tool:
python -m quark.onnx.tools.evaluate.py --baseline_results_folder [BASELINE_RESULTS_FOLDER_PATH] --quantized_results_folder [QUANTIZED_RESULTS_FOLDER_PATH]
Replace inf and -inf Values in ONNX Model Weights#
Replace inf or -inf values in ONNX model weights using the replace_inf_weights
tool with a specified value.
Use the replace_inf_weights
tool to do the conversion:
python -m quark.onnx.tools.replace_inf_weights --input_model [INPUT_MODEL_PATH] --output_model [OUTPUT_MODEL_PATH] --replace_inf_value [REPLACE_INF_VALUE]
Note
The default replacement value is 10000.0. This might lead to precision degradation. Adjust the replacement value based on your model and application needs.