SDXL Model Quantization using Quark#
This document provides examples of FP8 quantizing and exporting the SDXL models using Quark.
Third-party Dependencies#
The example relies on torchvision
, User need to install the version
of torchvision
that is compatible with their version of PyTorch.
export DIFFUSERS_ROOT=$PWD
git clone https://github.com/mlcommons/inference.git
cd inference
git checkout 87ba8cb8a6a4f6525f26255fa513d902b17ab060
cd ./text_to_image/tools/
sh ./download-coco-2014.sh --num-workers 5
sh ./download-coco-2014-calibration.sh -n 5
cd ${DIFFUSERS_ROOT}
export PYTHONPATH="${DIFFUSERS_ROOT}/inference/text_to_image/:$PYTHONPATH"
Dataset Files#
The calibration dataset file will be downloaded to
${DIFFUSERS_ROOT}/inference/text_to_image/coco2014/calibration/captions.tsv
.The test dataset file will be downloaded to
${DIFFUSERS_ROOT}/inference/text_to_image/coco2014/captions/captions_source.tsv
.
Quantization & Export Scripts#
You can run the following python scripts in the
examples/torch/diffusers
path.
Run with SDXL Without Quantization#
Run original SDXL:
python quantize_sdxl.py --float
Calibration and Export SafeTensor#
Run Calibration:
python quantize_sdxl.py --input_scheme {'per-tensor'} --weight_scheme {'per-tensor', 'per-channel'} --calib_data_tsv_file_path {your calibration dataset file path} --export
Load SafeTensor and Test#
Load and Test:
python quantize_sdxl.py --input_scheme {'per-tensor'} --weight_scheme {'per-tensor', 'per-channel'} --test_data_tsv_file_path {your calibration dataset file path} --load --test
Load SafeTensor and Run with a prompt#
Load and Run:
python quantize_sdxl.py --input_scheme {'per-tensor'} --weight_scheme {'per-tensor', 'per-channel'} --load --prompt "A city at night with people walking around."
SDXL Benchmark#
MI210 GPU, diffusers==0.21.2
Model Name |
FP16 (Without Quantization) |
FP8 + Per-Tensor |
---|---|---|
clip score |
31.74845 |
31.83954 |
fid |
23.56758 |
23.614748 |