Installation Guide#
Prerequisites#
Python 3.9+ is required.
Install PyTorch for the compute platform(CUDA, ROCM, CPU…). Version of torch >= 2.2.0.
Install ONNX of version >= 1.12.0, ONNX Runtime of version ~= 1.17.0, onnxruntime-extensions of version >= 0.4.2
Installation#
Install from ZIP#
Download the 📥quark.zip. Extract the downloaded zip file and there is a whl package in it. Or you can download whl package 📥quark.whl directly.
Install quark whl package by
pip install [quark whl package].whl
Installation Verification#
(Optional) Verify the installation by running
python -c "import quark"
. If it does not report error, the installation is done.(Optional) Compile the
fast quantization kernels
. When using Quark’s quantization APIs for the first time, it will compile thefast quantization kernels
using your installed Torch and CUDA if available. This process may take a few minutes but subsequent quantization calls will be much faster. To invoke this compilation now and check if it is successful, run the following command:python -c "import quark.torch.kernel"
(Optional) Compile the
custom operators library
. When using Quark-ONNX’s custom operators for the first time, it will compile thecustom operators library
using your local environment. To invoke this compilation now and check if it is successful, run the following command:python -c "import quark.onnx.operators.custom_ops"