Quark for Pytorch#

New Features (Version 0.1.0):#

Pytorch Quantizer Enhancements:
- Eager mode is supported.
- Post Training Quantization (PTQ) is now available.
- Automatic in-place replacement of nn.module operations.
- Quantization of the following modules is supported: torch.nn.linear.
- The customizable calibration process is introduced.
Quantization Strategy:
- Symmetric and asymmetric quantization are supported.
- Weight-only, dynamic, and static quantization modes are available.
Quantization Granularity:
- Support for per-tensor, per-channel, and per-group granularity.
Data Types:
- Multiple data types are supported, including float16, bfloat16, int4, uint4, int8, and fp8 (e4m3fn).
Calibration Methods:
- MinMax, Percentile, and MSE calibration methods are now supported.
Large Language Model Support:
- FP8 KV-cache quantization for large language models(LLMs).
Advanced Quantization Algorithms:
- Support SmoothQuant, AWQ(uint4), and GPTQ(uint4) for LLMs. (Note: AWQ/GPTQ/SmoothQuant algorithms are currently limited to single GPU usage.)
Export Capabilities:
- Export of Q/DQ quantized models to ONNX and vLLM-adopted JSON-safetensors format now supported.
Operating System Support:
- Linux (supports ROCM and CUDA)
- Windows (support CPU only).