Skip to main content
Ctrl+K
AMD Logo
Quark Version List
  • GitHub
  • Support

Quark 0.7 documentation

Release Notes

  • Release Information

Getting Started with Quark

  • Introduction to Quantization
  • Installation
  • Basic Usage
    • Quark for PyTorch
    • Quark for ONNX
  • Accessing PyTorch Examples
    • Diffusion Model Quantization
    • Quark Extension for Brevitas Integration
    • Integration with AMD Pytorch-light (APL)
    • Language Model Pruning
    • Language Model PTQ
    • Language Model QAT
    • Language Model Evaluation
      • Perplexity Evaluations
      • Rouge & Meteor Evaluations
      • LM-Eval Harness Evaluations
    • Vision Model Quantization using FX Graph Mode
  • Accessing ONNX Examples
    • Block Floating Point (BFP)
    • MX Formats
    • Fast Finetune AdaRound
    • Fast Finetune AdaQuant
    • Cross-Layer Equalization (CLE)
    • GPTQ
    • Mixed Precision
    • Smooth Quant
    • Quantizing an Llama-2-7b Model
    • Quantizing an OPT-125M Model
    • Quantizing a ResNet50-v1-12 Model
    • Quantizing an OPT-125M Model
    • Quantizing an Llama-2-7b Model Using the ONNX MatMulNBits
    • Quantizating Llama-2-7b model using MatMulNBits

Advanced Quark Features for PyTorch

  • Configuring PyTorch Quantization
    • Calibration Methods
    • Calibration Datasets
    • Quantization Strategies
    • Quantization Schemes
    • Quantization Symmetry
  • Save & Load Quantized Models
  • Exporting Quantized Models
    • ONNX Format
    • HuggingFace Format
    • GGUF Format
      • Bridge from Quark to llama.cpp
    • Quark Format
    • ONNX Runtime Gen AI Model Builder
  • Best Practices for Post-Training Quantization (PTQ)
  • Debugging quantization degradation
  • Language Model Optimization
    • Pruning
    • Language Model Post Training Quantization (PTQ) Using Quark
    • Language Model QAT Using Quark
    • Language Model Evaluations in Quark
      • Perplexity Evaluations
      • Rouge & Meteor Evaluations
      • LM-Eval Harness Evaluations
    • Quantizing with Rotation and SmoothQuant
  • Activation/Weight Smoothing (SmoothQuant)
  • Block Floating Point 16
  • Extensions
    • Integration with AMD Pytorch-light (APL)
    • Brevitas Integration
  • Using MX (Microscaling)
  • Two Level Quantization Formats

Advanced Quark Features for ONNX

  • Configuring ONNX Quantization
    • Full List of Quantization Config Features
    • Calibration methods
    • Calibration datasets
    • Quantization Strategies
    • Quantization Schemes
    • Quantization Symmetry
  • Data and OP Types
  • Accelerate with GPUs
  • Mixed Precision
  • Block Floating Point 16 (BFP16)
  • BF16 Quantization
  • Microscaling (MX)
  • Accuracy Improvement Algorithms
    • Quantizing Using CrossLayerEqualization (CLE)
    • Quantization Using AdaQuant and AdaRound
    • SmoothQuant (SQ)
    • Quantizating a model with GPTQ
  • Optional Utilities
  • Tools

APIs

  • PyTorch APIs
    • Pruning
    • Quantization
    • Export
    • Pruner Configuration
    • Quantizer Configuration
    • Exporter Configuration
  • ONNX APIs
    • Quantization
    • Optimization
    • Calibration
    • ONNX Quantizer
    • QDQ Quantizer
    • Configuration
    • Quantization Utilities

Troubleshooting and Support

  • PyTorch FAQ
  • ONNX FAQ
  • Language Model Optimization

Language Model Optimization

Language Model Optimization#

  • Pruning
  • Language Model Post Training Quantization (PTQ) Using Quark
  • Language Model QAT Using Quark
  • Language Model Evaluations in Quark
  • Quantizing with Rotation and SmoothQuant

previous

Debugging quantization degradation in Quark

next

Quantizing with Rotation and SmoothQuant

Last updated on Feb 11, 2025.

  • Terms and Conditions
  • Quark Licenses and Disclaimers
  • Privacy
  • Trademarks
  • Statement on Forced Labor
  • Fair and Open Competition
  • UK Tax Strategy
  • Cookie Policy
  • Cookie Settings
© 2024 Advanced Micro Devices, Inc