Yolo_nas and Yolox Quantization#

This example introduces how to quantize a yolo_nas/yolox model with Quark ONNX and evaluate its accuracy.

The Quark ONNX example has the following parts:

Pip Requirements#

Install the necessary Python packages (using CPU as default):

bash env.sh

If you want to use GPU, use the following command instead:

bash env.sh
pip uninstall onnxruntime
pip uninstall onnxruntime-gpu
pip install onnxruntime-gpu==1.18.1

Note

The difference between using CPU and GPU lies in whether CUDA acceleration is used during model evaluation.

Prepare Data#

To use this Dataset you need to:

Infer the Float Model (Optional)#

You can use this command to infer the float model with evaluation data.

python infer_float_yolo_model.py --model_name yolo_nas_s --eval_data_path [EVAL_DATA]

If you want to use GPU, use the following command instead:

python infer_float_yolo_model.py --model_name yolo_nas_s --eval_data_path [EVAL_DATA] --gpu

Quantization#

We take yolo_nas_s as an example.

This command first generates a float model in the models folder. Next, it prepares the evaluation data val_data. Then, the model is quantized using the calibration data (prepared in the first inference process) and the specified configuration (you can specify XINT8, A8W8 or A16W8). Finally, the quantized model is evaluated using the evaluation data.

python quantize_yolo.py --model_name yolo_nas_s --calib_data_path [CALIB_DATA_PATH] --eval_data_path [EVAL_DATA] --config XINT8

If you want to use GPU, use the following command instead:

python quantize_yolo.py --model_name yolo_nas_s --calib_data_path [CALIB_DATA_PATH] --eval_data_path [EVAL_DATA] --config XINT8 --gpu

If you customize the calibration data and evaluation data, please use the command below.

python quantize_yolo.py --model_name yolo_nas_s --calib_data_path [CALIB_DATA_PATH] --eval_data_path [CUSTOME_EVAL_DATA] --config XINT8

If you want to use GPU, use the following command instead:

python quantize_yolo.py --model_name yolo_nas_s --calib_data_path [CALIB_DATA_PATH] --eval_data_path [CUSTOME_EVAL_DATA] --config XINT8 --gpu

Note

If using AdaRound or AdaQuant, utilizing a GPU will also accelerate the finetune process.

Results#

As seen in the table, generally, the Top-1 accuracy of A16W8 is higher than that of A8W8, which in turn is higher than XINT8. Among them, A16W8 often achieves accuracy close to that of the float model. You can also use some accuracy enhancement methods to improve accuracy, such as specifying the XINT8_ADAROUND, A8W8_ADAROUND or A16W8_ADAROUND configuration.

Yolox_s Float Model

Quantized with XINT8 config

Quantized with A8W8 config

Quantized with A16W8 config

Model Size

36 MB

9 MB

9 MB

9 MB

mAP0.5:0.95

40.51 %

29.40 %

30.92 %

36.99 %

Yolo_nas_s Float Model

Quantized with XINT8 config

Quantized with A8W8 config

Quantized with A16W8 config

Model Size

46.5 MB

12.0 MB

12.0 MB

12.0 MB

mAP0.5

60.11 %

25.97 %

51.51 %

59.35 %