Skip to main content
Ctrl+K
AMD Logo
Quark Version List
  • GitHub
  • Support

AMD Quark 0.9 documentation

Release Notes

  • Release Information

Getting Started with AMD Quark

  • Introduction to Quantization
  • Installation
  • Gettting started: Introduction
  • Gettting started: Quark for ONNX
  • Gettting started: Quark for PyTorch
  • PyTorch Examples
    • Diffusion Model Quantization
    • AMD Quark Extension for Brevitas Integration
    • Integration with AMD Pytorch-light (APL)
    • Language Model Pruning
    • Language Model PTQ
    • Language Model QAT
    • Language Model Evaluation
      • Perplexity Evaluations
      • Rouge & Meteor Evaluations
      • LM-Evaluation-Harness Evaluations
      • LM-Evaluation-Harness (Offline)
    • Vision Model Quantization using FX Graph Mode
      • Image Classification Models FX Graph Quantization
      • YOLO-NAS FX graph Quantization
      • YOLO-X Tiny FX Graph Quantization
  • ONNX Examples
    • Block Floating Point (BFP)
    • MX Formats
    • Fast Finetune AdaRound
    • Fast Finetune AdaQuant
    • Cross-Layer Equalization (CLE)
    • Layer-wise Percentile
    • GPTQ
    • Mixed Precision
    • Smooth Quant
    • QuaRot
    • Auto-Search for General Yolov3 ONNX Model Quantization
    • Auto-Search for Ryzen AI Yolo-nas ONNX Model Quantization
    • Auto-Search for Ryzen AI Resnet50 ONNX Model Quantization
    • Auto-Search for Ryzen AI Yolov3 ONNX Quantization with Custom Evalutor
    • Quantizing an Llama-2-7b Model
    • Quantizing an OPT-125M Model
    • Quantizing a ResNet50-v1-12 Model
    • Quantizing an OPT-125M Model
    • Quantizing an Llama-2-7b Model Using the ONNX MatMulNBits
    • Quantizating Llama-2-7b model using MatMulNBits
    • Quantizing a ResNet50 model in crypto mode
    • Best Practice for Quantizing an Image Classification Model
    • Best Practice for Quantizing an Object Detection Model
    • Hugging Face TIMM Quantization
    • Yolo_nas and Yolox Quantization

Supported accelerators

  • AMD Ryzen AI
    • Quick Start for Ryzen AI
    • Best Practice for Ryzen AI in AMD Quark ONNX
    • Auto-Search for Ryzen AI ONNX Model Quantization
    • Quantizing LLMs for ONNX Runtime GenAI
    • FP32/FP16 to BF16 Model Conversion
    • Power-of-Two Scales (XINT8) Quantization
    • Float Scales (A8W8 and A16W8) Quantization
  • AMD Instinct
    • FP8 (OCP fp8_e4m3) Quantization & Json_SafeTensors_Export with KV Cache
    • Evaluation of Quantized Models

Advanced AMD Quark Features for PyTorch

  • Configuring PyTorch Quantization
    • Calibration Methods
    • Calibration Datasets
    • Quantization Strategies
    • Quantization Schemes
    • Quantization Symmetry
  • Save and Load Quantized Models
  • Exporting Quantized Models
    • ONNX format
    • Hugging Face format (safetensors)
    • GGUF format
      • Bridge from Quark to llama.cpp
    • Quark format
  • Best Practices for Post-Training Quantization (PTQ)
  • Debugging quantization Degradation
  • Language Model Optimization
    • Pruning
    • Language Model Post Training Quantization (PTQ) Using Quark
    • Language Model QAT Using Quark and Trainer
    • Language Model Evaluations in Quark
      • Perplexity Evaluations
      • Rouge & Meteor Evaluations
      • LM-Evaluation-Harness Evaluations
      • LM-Evaluation-Harness (Offline)
    • Quantizing with Rotation and SmoothQuant
    • Rotation-based quantization with QuaRot
  • Activation/Weight Smoothing (SmoothQuant)
  • Block Floating Point 16
  • Extensions
    • Integration with AMD Pytorch-light (APL)
    • Brevitas Integration
  • Using MX (Microscaling)
  • Two Level Quantization Formats

Advanced Quark Features for ONNX

  • Configuring ONNX Quantization
    • Full List of Quantization Config Features
    • Calibration methods
    • Calibration datasets
    • Quantization Strategies
    • Quantization Schemes
    • Quantization Symmetry
  • Data and OP Types
    • ExtendedQuantizeLinear
    • ExtendedDequantizeLinear
    • ExtendedInstanceNormalization
    • ExtendedLSTM
    • BFPQuantizeDequantize
    • MXQuantizeDequantize
  • Accelerate with GPUs
  • Mixed Precision
  • Block Floating Point 16 (BFP16)
  • BF16 Quantization
  • Microscaling (MX)
  • Microexponents (MX)
  • Accuracy Improvement Algorithms
    • Quantizing Using CrossLayerEqualization (CLE)
    • Quantization Using AdaQuant and AdaRound
    • SmoothQuant (SQ)
    • QuaRot
    • Quantizating a model with GPTQ
  • Automatic Search for Model Quantization
  • Using ONNX Model Inference and Saving Input Data in NPY Format
  • Optional Utilities
  • Tools

Tutorials

  • AMD Quark Tutorial: PyTorch Quickstart

APIs

  • PyTorch APIs
    • Pruning
    • Quantization
    • Export
    • Pruner Configuration
    • Quantizer Configuration
    • Exporter Configuration
  • ONNX APIs
    • Quantization
    • Optimization
    • Calibration
    • ONNX Quantizer
    • QDQ Quantizer
    • Configuration
    • Quantization Utilities

Troubleshooting and Support

  • PyTorch FAQ
  • ONNX FAQ
  • AMD Quark release history
  • Quark license
  • Extensions for PyTorch

Extensions for PyTorch

Extensions for PyTorch#

  • Integration with AMD Pytorch-light (APL)
  • Brevitas Integration

previous

BFP16 (Block floating point) Quantization

next

Using OCP MX (Microscaling)

Last updated on Jul 18, 2025.

  • Terms and Conditions
  • Quark Licenses and Disclaimers
  • Privacy
  • Trademarks
  • Supply Chain Transparency
  • Fair and Open Competition
  • UK Tax Strategy
  • Cookie Policy
  • Cookie Settings
© 2025 Advanced Micro Devices, Inc