Quark for Pytorch#

python PyTorch

New Features (Version 0.1.0):#

  • Pytorch Quantizer Enhancements:

    • Eager mode is supported.

    • Post Training Quantization (PTQ) is now available.

    • Automatic in-place replacement of nn.module operations.

    • Quantization of the following modules is supported: torch.nn.linear.

    • The customizable calibration process is introduced.

  • Quantization Strategy:

    • Symmetric and asymmetric quantization are supported.

    • Weight-only, dynamic, and static quantization modes are available.

  • Quantization Granularity:

    • Support for per-tensor, per-channel, and per-group granularity.

  • Data Types:

    • Multiple data types are supported, including float16, bfloat16, int4, uint4, int8, and fp8 (e4m3fn).

  • Calibration Methods:

    • MinMax, Percentile, and MSE calibration methods are now supported.

  • Large Language Model Support:

    • FP8 KV-cache quantization for large language models(LLMs).

  • Advanced Quantization Algorithms:

    • Support SmoothQuant, AWQ(uint4), and GPTQ(uint4) for LLMs. (Note: AWQ/GPTQ/SmoothQuant algorithms are currently limited to single GPU usage.)

  • Export Capabilities:

    • Export of Q/DQ quantized models to ONNX and vLLM-adopted JSON-safetensors format now supported.

  • Operating System Support:

    • Linux (supports ROCM and CUDA)

    • Windows (support CPU only).