Skip to main content
Ctrl+K
AMD Logo
Quark Version List
  • GitHub
  • Support

AMD Quark 0.12 documentation

Release Notes

  • Release Information

Getting Started with AMD Quark

  • Introduction to Quantization
  • Installation
  • Getting started: Introduction
  • Getting started: Quark for PyTorch
  • Getting started: Quark for ONNX
  • PyTorch Tutorials
    • Quickstart
      • AMD Quark Tutorial: PyTorch Quickstart
    • LLM Tutorials
      • LLM Model Depth-Wise Pruning (beta)
      • Quantizing a Large Language Model with Quark
      • FP8 Quantization with Per-Channel Static Weights and Per-Token Dynamic Activations
      • FP4 Quantization for LLM models
      • FP8 Quantization for LLM models
    • Vision Tutorials
      • YOLO-X Tiny Quant example
    • Diffusion Tutorials
      • Quantizing a Diffusion Model using Quark
  • PyTorch Examples
    • Diffusion Model Quantization
    • Diffusion Models with HuggingFace Diffusers
    • Diffusion Model Quantization xDiT
    • AMD Quark Extension for Brevitas Integration
    • Integration with AMD Pytorch-light (APL)
    • Language Model Pruning
    • Language Model PTQ
      • FP4 Post Training Quantization (PTQ) for LLM models
      • FP8 Post Training Quantization (PTQ) for LLM models
    • Language Model QAT
    • Language Model Evaluation
      • Perplexity Evaluations
      • Rouge & Meteor Evaluations
      • LM-Evaluation-Harness Evaluations
      • LM-Evaluation-Harness (Offline)
    • Vision Model Quantization using FX Graph Mode
      • Image Classification Models FX Graph Quantization
      • YOLO-NAS FX graph Quantization
  • ONNX Tutorials
    • Quick Starts
      • Image Classification
      • Torch Image Models
    • Ryzen AI-Specific Tutorials
      • ResNet50
      • YOLOv8
      • Auto Search On MobileNetv2-50
      • Auto Search On ResNet50
      • Auto Search On YOLOv8
    • Accuracy Improvement Methods
      • Layer-wise Percentile
      • Cross Layer Equalization (CLE)
      • ADAQuant
      • ADARound
      • Mixed Precision
      • Smooth Quant
    • Auto Search
      • Auto Search Tutorial
    • Custom Operators Tutorials
      • Block Floating Point (BFP)
      • Microexponents (MX)
      • Custom Operators
    • Crypto Mode Tutorials
      • Crypto Mode
    • LLM Tutorials
      • GPTQ

Supported accelerators

  • AMD Ryzen AI
    • Quick Start for Ryzen AI
    • Best Practice for Ryzen AI
    • INT8/INT16 Quantizations
      • Power-of-Two Scales (XINT8) Quantization
      • Float Scales (A8W8 and A16W8) Quantization
    • FP32/FP16 to BF16 Model Conversion
    • Quantize LLMs for ONNX Runtime GenAI
    • AutoSearch for Ryzen AI
    • Efficiency Improvement
      • Accelerate with GPUs
      • Accelerate with Settings
        • Speed Up Fast Finetuning with GDS
      • Memory and Disk Friendly Settings
    • Troubleshooting
  • AMD Instinct

Advanced AMD Quark Features for PyTorch

  • Configuring PyTorch Quantization for Large Language Models
  • Configuring PyTorch Quantization from Scratch
    • Calibration Methods
    • Calibration Datasets
    • Quantization Strategies
    • Quantization Schemes
    • Quantization Symmetry
  • Save and Load Quantized Models
  • Exporting Quantized Models
    • ONNX format
    • Hugging Face format (safetensors)
    • GGUF format
      • Bridge from Quark to llama.cpp
  • Best Practices for Post-Training Quantization (PTQ)
  • Debugging quantization Degradation
  • File-to-File LLM Quantization
  • Language Model Optimization
    • LLM Pruning
    • Language Model Post Training Quantization (PTQ) Using Quark
      • FP4 Post Training Quantization (PTQ) for LLM models
      • FP8 Post Training Quantization (PTQ) for LLM models
    • Language Model QAT Using Quark and Trainer
    • Language Model Evaluations in Quark
      • Perplexity Evaluations
      • Rouge & Meteor Evaluations
      • LM-Evaluation-Harness Evaluations
      • LM-Evaluation-Harness (Offline)
    • Rotation pre-processing optimization
    • Hands-on Quantizing and Serving of Large Models
  • Activation/Weight Smoothing (SmoothQuant)
  • Auto SmoothQuant
  • Activation-aware Weight Quantization (AWQ)
    • AWQ end-to-end demo
  • SVD-Based Error Correction (SVDQuant)
  • Block Floating Point 16
  • Extensions
    • Integration with AMD Pytorch-light (APL)
    • Brevitas Integration
  • Using MX (Microscaling)
  • Two Level Quantization Formats
  • Using Quark Agent Skills (Claude Code)
  • Using Quark Agent Skills (Claude Code)

Advanced Quark Features for ONNX

  • Configuring ONNX Quantization
    • Full List of Quantization Config Features
    • Quantization Configuration Serialization and Deserialization
    • Calibration Datasets
    • Quantization Strategies
    • Quantization Schemes
    • Calibration Methods
  • Data and Op Types
    • Supported Optype and Datatype
      • ExtendedQuantizeLinear
      • ExtendedDequantizeLinear
      • ExtendedInstanceNormalization
      • ExtendedLSTM
      • BFPQuantizeDequantize
      • MXQuantizeDequantize
    • BFP16 Quantization
    • BF16 Quantization
    • Microscaling (MX)
    • Microexponents (MX)
  • Accuracy Improvement
    • LayerwisePercentile
    • Quantizing Using CrossLayerEqualization (CLE)
    • AdaQuant and AdaRound
    • AutoSearch
      • AutoSearch
      • AutoSearch Pro
    • Mixed Precision
    • SmoothQuant
    • QuaRot (experimental)
  • Efficiency Improvement
    • Accelerate with GPUs
    • Accelerate with Settings
      • Speed Up Fast Finetuning with GDS
    • Memory and Disk Friendly Settings
  • Tooling
    • Latency and Memory profiling for Quark ONNX
    • Tools
    • Optional Utilities

Third-party contributions

  • Introduction and guidelines

Experimental Features

  • Quark CLI
    • Shapeshifter
      • ONNX Model Passes
      • PyTorch Model Passes
  • Mix Precision Auto-Search
  • Blockwise Joint Tuning
  • Built-in Profiling
  • ONNX Examples
    • QuaRot
    • Quantizing an Llama-2-7b Model
    • Quantizing an OPT-125M Model
    • Quantizing an OPT-125M Model
    • Quantizing an Llama-2-7b Model Using the ONNX MatMulNBits
    • Quantizing Llama-2-7b model using MatMulNBits

APIs

  • PyTorch APIs
    • Pruning
    • Quantization
    • Export
    • Pruner Configuration
    • Quantizer Configuration
    • Quantizer Template
    • Exporter Configuration
  • ONNX APIs
    • Quantization
    • Quantizer Configuration
      • Quantization Strategies
      • Data Types
      • Algorithm Classes

Troubleshooting and Support

  • PyTorch Troubleshooting
  • ONNX Troubleshooting
  • AMD Quark release history
  • Quark license
  • Quark APIs for PyTorch
  • Pruning

Pruning

Pruning#

previous

Quark APIs for PyTorch

next

PyTorch quantization

Last updated on Jul 03, 2026.

  • Terms and Conditions
  • Quark Licenses and Disclaimers
  • Privacy
  • Trademarks
  • Supply Chain Transparency
  • Fair and Open Competition
  • UK Tax Strategy
  • Cookie Policy
  • Cookie Settings
© 2025 Advanced Micro Devices, Inc