Skip to main content
Ctrl+K
AMD Logo
Quark Version List
  • GitHub
  • Support

AMD Quark 0.11 documentation

Release Notes

  • Release Information

Getting Started with AMD Quark

  • Introduction to Quantization
  • Installation
  • Getting started: Introduction
  • Getting started: Quark for ONNX
  • Getting started: Quark for PyTorch
  • PyTorch Examples
    • Diffusion Model Quantization
    • AMD Quark Extension for Brevitas Integration
    • Integration with AMD Pytorch-light (APL)
    • Language Model Pruning
    • Language Model PTQ
      • FP4 Post Training Quantization (PTQ) for LLM models
      • FP8 Post Training Quantization (PTQ) for LLM models
    • Language Model QAT
    • Language Model Evaluation
      • Perplexity Evaluations
      • Rouge & Meteor Evaluations
      • LM-Evaluation-Harness Evaluations
      • LM-Evaluation-Harness (Offline)
    • Vision Model Quantization using FX Graph Mode
      • Image Classification Models FX Graph Quantization
      • YOLO-NAS FX graph Quantization
  • ONNX Examples
    • Block Floating Point (BFP)
    • MX Formats
    • Fast Finetune AdaRound
    • Fast Finetune AdaQuant
    • Cross-Layer Equalization (CLE)
    • Layer-wise Percentile
    • GPTQ
    • Mixed Precision
    • Smooth Quant
    • QuaRot
    • Auto-Search for Ryzen AI Yolov8 ONNX Model Quantization
    • Auto-Search for Ryzen AI MobileNetv2-50 ONNX Quantization with Custom Evaluator
    • Auto-Search for Ryzen AI Resnet50 ONNX Model Quantization
    • Quantizing an Llama-2-7b Model
    • Quantizing an OPT-125M Model
    • Quantizing a ResNet50-v1-12 Model
    • Quantizing a Huggingface TIMM Model
    • Quantizing an OPT-125M Model
    • Quantizing an Llama-2-7b Model Using the ONNX MatMulNBits
    • Quantizing Llama-2-7b model using MatMulNBits
    • Quantizing a ResNet50 model in crypto mode
    • Best Practice for Quantizing an Image Classification Model
    • Best Practice for Quantizing an Object Detection Model

Supported accelerators

  • AMD Ryzen AI
    • Quick Start for Ryzen AI
    • Best Practice for Ryzen AI in AMD Quark ONNX
    • Auto-Search for Ryzen AI ONNX Model Quantization
    • Quantizing LLMs for ONNX Runtime GenAI
    • FP32/FP16 to BF16 Model Conversion
    • Power-of-Two Scales (XINT8) Quantization
    • Float Scales (A8W8 and A16W8) Quantization
  • AMD Instinct
    • Language Model Post Training Quantization (PTQ) Using Quark
      • FP4 Post Training Quantization (PTQ) for LLM models
      • FP8 Post Training Quantization (PTQ) for LLM models
    • Evaluation of Quantized Models

Advanced AMD Quark Features for PyTorch

  • Configuring PyTorch Quantization for Large Language Models
  • Configuring PyTorch Quantization from Scratch
    • Calibration Methods
    • Calibration Datasets
    • Quantization Strategies
    • Quantization Schemes
    • Quantization Symmetry
  • Save and Load Quantized Models
  • Exporting Quantized Models
    • ONNX format
    • Hugging Face format (safetensors)
    • GGUF format
      • Bridge from Quark to llama.cpp
  • Best Practices for Post-Training Quantization (PTQ)
  • Debugging quantization Degradation
  • Language Model Optimization
    • LLM Pruning
    • Language Model Post Training Quantization (PTQ) Using Quark
      • FP4 Post Training Quantization (PTQ) for LLM models
      • FP8 Post Training Quantization (PTQ) for LLM models
    • Language Model QAT Using Quark and Trainer
    • Language Model Evaluations in Quark
      • Perplexity Evaluations
      • Rouge & Meteor Evaluations
      • LM-Evaluation-Harness Evaluations
      • LM-Evaluation-Harness (Offline)
    • Rotation pre-processing optimization
  • Activation/Weight Smoothing (SmoothQuant)
  • Auto SmoothQuant
  • Activation-aware Weight Quantization (AWQ)
    • AWQ end-to-end demo
  • Block Floating Point 16
  • Extensions
    • Integration with AMD Pytorch-light (APL)
    • Brevitas Integration
  • Using MX (Microscaling)
  • Two Level Quantization Formats

Advanced Quark Features for ONNX

  • Configuring ONNX Quantization
    • Full List of Quantization Config Features
    • Calibration datasets
    • Quantization Strategies
    • Quantization Schemes
    • Calibration Methods
  • Data and OP Types
    • ExtendedQuantizeLinear
    • ExtendedDequantizeLinear
    • ExtendedInstanceNormalization
    • ExtendedLSTM
    • BFPQuantizeDequantize
    • MXQuantizeDequantize
  • Accelerate with GPUs
  • Mixed Precision
  • Block Floating Point 16 (BFP16)
  • BF16 Quantization
  • Microscaling (MX)
  • Microexponents (MX)
  • Accuracy Improvement Algorithms
    • Quantizing Using CrossLayerEqualization (CLE)
    • Quantization Using AdaQuant and AdaRound
    • SmoothQuant (SQ)
    • Quark ONNX Quantization Tutorial For Block Floating Point (BFP)
    • Quark ONNX Quantization Tutorial For GPTQ
    • QuaRot
  • Automatic Search for Model Quantization
  • Automatic Search Pro for Model Quantization
  • Latency and Memory profiling
  • Using ONNX Model Inference and Saving Input Data in NPY Format
  • Optional Utilities
  • Tools

Tutorials

  • AMD Quark Tutorial: PyTorch Quickstart
  • Quantizing a Diffusion Model using Quark
  • LLM Model Depth-Wise Pruning (beta)
  • Quantizing a Large Language Model with Quark
  • FP8 Quantization with Per-Channel Static Weights and Per-Token Dynamic Activations
  • YOLO-X Tiny Quant example
  • Quark ONNX Quantization Tutorial For AdaQuant
  • Quark ONNX Quantization Tutorial For AdaRound
  • Quark ONNX Quantization Tutorial For Block Floating Point (BFP)
  • Quark ONNX Quantization Tutorial For Cross Layer Equalization (CLE)
  • Quark ONNX Quantization Tutorial For GPTQ
  • Quark ONNX Quantization Tutorial For Layerwise Percentile
  • Quark ONNX Quantization Tutorial For Mixed Precision
  • Quark ONNX Quantization Tutorial For Image Classification
  • Quark ONNX Quantization Tutorial For Smooth Quant
  • Quark ONNX Quantization Tutorial For Auto Search
  • Quark ONNX Quantization Tutorial For Image Classification
  • Quark ONNX Quantization Tutorial For AdaQuant
  • Quark ONNX Quantization Tutorial For Image Classification
  • Quark ONNX Quantization Tutorial For Auto Search
  • Quark ONNX Quantization Tutorial For Auto Search
  • Quark ONNX Quantization Tutorial For Auto Search
  • Quark ONNX Quantization Tutorial For Resnet50
  • Quark ONNX Quantization Tutorial For YOLOv8
  • Quantizing ONNX Models with Custom Operators Using Quark

Third-party contributions

  • Introduction and guidelines

Experimental Features

  • Quark CLI
    • ONNX Adapter

APIs

  • PyTorch APIs
    • Pruning
    • Quantization
    • Export
    • Pruner Configuration
    • Quantizer Configuration
    • Quantizer Template
    • Exporter Configuration
  • ONNX APIs
    • Quantization
    • Quantizer Configuration
      • Quantization Strategies
      • Data Types
      • Algorithm Classes

Troubleshooting and Support

  • PyTorch Troubleshooting
  • ONNX Troubleshooting
  • AMD Quark release history
  • Quark license
  • Overview: module code

All modules for which code is available

  • quark.torch.export.api
  • quark.torch.export.config.config
  • quark.torch.pruning.api
  • quark.torch.pruning.config
  • quark.torch.quantization.api
  • quark.torch.quantization.config.config
  • quark.torch.quantization.config.template

Last updated on Jan 16, 2026.

  • Terms and Conditions
  • Quark Licenses and Disclaimers
  • Privacy
  • Trademarks
  • Supply Chain Transparency
  • Fair and Open Competition
  • UK Tax Strategy
  • Cookie Policy
  • Cookie Settings
© 2025 Advanced Micro Devices, Inc