Skip to main content
Ctrl+K
AMD Logo
Quark Version List
  • GitHub
  • Support

Quark 0.8.2 documentation

Release Notes

  • Release Information

Getting Started with AMD Quark

  • Introduction to Quantization
  • Installation
  • Gettting started: Introduction
  • Gettting started: Quark for ONNX
  • Gettting started: Quark for PyTorch
  • Accessing PyTorch Examples
    • Diffusion Model Quantization
    • AMD Quark Extension for Brevitas Integration
    • Integration with AMD Pytorch-light (APL)
    • Language Model Pruning
    • Language Model PTQ
    • Language Model QAT
    • Language Model Evaluation
      • Perplexity Evaluations
      • Rouge & Meteor Evaluations
      • LM-Evaluation Harness Evaluations
      • LM-Evaluation Harness (Offline)
    • Vision Model Quantization using FX Graph Mode
  • Accessing ONNX Examples
    • Block Floating Point (BFP)
    • MX Formats
    • Fast Finetune AdaRound
    • Fast Finetune AdaQuant
    • Cross-Layer Equalization (CLE)
    • GPTQ
    • Mixed Precision
    • Smooth Quant
    • QuaRot
    • Auto-Search for General Yolov3 ONNX Model Quantization
    • Auto-Search for Ryzen AI Yolo-nas ONNX Model Quantization
    • Auto-Search for Ryzen AI Resnet50 ONNX Model Quantization
    • Auto-Search for Ryzen AI Yolov3 ONNX Quantization with Custom Evalutor
    • Quantizing an Llama-2-7b Model
    • Quantizing an OPT-125M Model
    • Quantizing a ResNet50-v1-12 Model
    • Quantizing an OPT-125M Model
    • Quantizing an Llama-2-7b Model Using the ONNX MatMulNBits
    • Quantizating Llama-2-7b model using MatMulNBits
    • Best Practice for Quantizing an Image Classification Model
    • Best Practice for Quantizing an Object Detection Model

Supported accelerators

  • AMD Ryzen AI
    • Quick Start for Ryzen AI
    • Best Practice for Ryzen AI in AMD Quark ONNX
    • Auto-Search for Ryzen AI ONNX Model Quantization
    • Quantizing LLMs for ONNX Runtime GenAI
    • FP32/FP16 to BF16 Model Conversion
    • Power-of-Two Scales (Xint8) Quantization
    • Float Scales (A8W8 and A16W8) Quantization
  • AMD Instinct
    • FP8 (OCP fp8_e4m3) Quantization & Json_SafeTensors_Export with KV Cache
    • Evaluation of Quantized Models

Advanced AMD Quark Features for PyTorch

  • Configuring PyTorch Quantization
    • Calibration Methods
    • Calibration Datasets
    • Quantization Strategies
    • Quantization Schemes
    • Quantization Symmetry
  • Save and Load Quantized Models
  • Exporting Quantized Models
    • ONNX Format
    • HuggingFace Format
    • GGUF Format
      • Bridge from Quark to llama.cpp
    • Quark Format
  • Best Practices for Post-Training Quantization (PTQ)
  • Debugging quantization Degradation
  • Language Model Optimization
    • Pruning
    • Language Model Post Training Quantization (PTQ) Using Quark
    • Language Model QAT Using Quark
    • Language Model Evaluations in Quark
      • Perplexity Evaluations
      • Rouge & Meteor Evaluations
      • LM-Evaluation Harness Evaluations
      • LM-Evaluation Harness (Offline)
    • Quantizing with Rotation and SmoothQuant
    • Rotation-based quantization with QuaRot
  • Activation/Weight Smoothing (SmoothQuant)
  • Block Floating Point 16
  • Extensions
    • Integration with AMD Pytorch-light (APL)
    • Brevitas Integration
  • Using MX (Microscaling)
  • Two Level Quantization Formats

Advanced Quark Features for ONNX

  • Configuring ONNX Quantization
    • Full List of Quantization Config Features
    • Calibration methods
    • Calibration datasets
    • Quantization Strategies
    • Quantization Schemes
    • Quantization Symmetry
  • Data and OP Types
  • Accelerate with GPUs
  • Mixed Precision
  • Block Floating Point 16 (BFP16)
  • BF16 Quantization
  • Microscaling (MX)
  • Microexponents (MX)
  • Accuracy Improvement Algorithms
    • Quantizing Using CrossLayerEqualization (CLE)
    • Quantization Using AdaQuant and AdaRound
    • SmoothQuant (SQ)
    • QuaRot
    • Quantizating a model with GPTQ
  • Automatic Search for Model Quantization
  • Using ONNX Model Inference and Saving Input Data in NPY Format
  • Optional Utilities
  • Tools

APIs

  • PyTorch APIs
    • Pruning
    • Quantization
    • Export
    • Pruner Configuration
    • Quantizer Configuration
    • Exporter Configuration
  • ONNX APIs
    • Quantization
    • Optimization
    • Calibration
    • ONNX Quantizer
    • QDQ Quantizer
    • Configuration
    • Quantization Utilities

Troubleshooting and Support

  • PyTorch FAQ
  • ONNX FAQ
  • AMD Quark release history
  • Quark license

quark.onnx.auto_search

Contents

  • Module Contents
    • Classes
    • Functions
      • l2_metric()
      • l1_metric()
      • cos_metric()
      • psnr_metric()
      • ssim_metric()
      • buildin_eval_func()
      • AssembleIdxs
        • AssembleIdxs.search_forward()
        • AssembleIdxs.run()
      • SearchSpace
        • SearchSpace.three_level_spaces()

quark.onnx.auto_search#

Module Contents#

Classes#

AssembleIdxs

List all the combination of one list.

SearchSpace

Build the all possible search space from the input.

Functions#

l2_metric(→ Any)

Calculate the L2 metric between baseline and reference inputs.

l1_metric(→ Any)

Calculate the L1 metric between baseline and reference inputs.

cos_metric(→ Any)

Calculate the cosine metric between baseline and reference inputs.

psnr_metric(→ Any)

Calculate the psnr metric between baseline and reference inputs.

ssim_metric(→ Any)

Calculate the ssim metric between baseline and reference inputs.

buildin_eval_func(→ str)

Buildin evalation function using data_reader

quark.onnx.auto_search.l2_metric(base_input: Any, ref_input: Any) → Any#

Calculate the L2 metric between baseline and reference inputs.

Args:

base_input: Baseline input as a numpy array of float32. ref_input: Reference input as a numpy array of float32.

Returns:

The L2 metric as a float32 value.

Note:

Only np.ndarray datatype is accepted as input.

quark.onnx.auto_search.l1_metric(base_input: Any, ref_input: Any) → Any#

Calculate the L1 metric between baseline and reference inputs.

Args:

base_input: Baseline input as a numpy array of float32. ref_input: Reference input as a numpy array of float32.

Returns:

The L1 metric as a float32 value.

Note:

Only np.ndarray datatype is accepted as input.

quark.onnx.auto_search.cos_metric(base_input: Any, ref_input: Any) → Any#

Calculate the cosine metric between baseline and reference inputs.

Args:

base_input: Baseline input as a numpy array of float32. ref_input: Reference input as a numpy array of float32.

Returns:

The cosine metric as a float32 value. Value range: [0.0, 1.0]

Note:

Only np.ndarray datatype is accepted as input.

quark.onnx.auto_search.psnr_metric(base_input: Any, ref_input: Any) → Any#

Calculate the psnr metric between baseline and reference inputs.

Args:

base_input: Baseline input as a numpy array of float32. ref_input: Reference input as a numpy array of float32.

Returns:

The psnr metric as a float32 value.

Note:

Only np.ndarray datatype is accepted as input.

quark.onnx.auto_search.ssim_metric(base_input: Any, ref_input: Any) → Any#

Calculate the ssim metric between baseline and reference inputs.

Args:

base_input: Baseline input as a numpy array of float32. ref_input: Reference input as a numpy array of float32.

Returns:

The ssim metric as a float32 value.

Note:

Only np.ndarray datatype is accepted as input.

quark.onnx.auto_search.buildin_eval_func(onnx_path: str, data_reader: Any, save_path: str = '', save_prefix: str = 'iter_x_') → str#

Buildin evalation function using data_reader

Args:

onnx_path: onnx model path that will excute evalution, it can be either float porint or quantized onnx model data_reader: user defined data_reader save_path: path used to save the output result save_prefix: prefix string used to name the saved output

Note: Data reader here should be defined as dataloader.Because the raw data reader is iterator, it’s

not convient for evaluation.

class quark.onnx.auto_search.AssembleIdxs(values_idxs: Any)#

List all the combination of one list. Example:

input_idxs: [[1,2,], [3,4]] output: [[1,3], [1,4], [2,3], [2,4]]

Args:

values_idxs

Note:

Only list the item in the input[i] list.

search_forward(item_forward: list[Any]) → None#

Recresively find the next item until the last one.

Args:

item_forward: searched item collection.

run() → list[int | list[int]]#

Excute the assemble process and return the result

class quark.onnx.auto_search.SearchSpace(conf: Dict[str, Any])#

Build the all possible search space from the input. # TODO remove the invalid config generated by the search space # TODO give the config priority # TODO validate the space dict right

Args:

config: config which includes the search space defined by the list

Note:

Because the search space is in difference levels, so we need to split the level of the search space.

three_level_spaces() → list[Any]#

According to the user defined search space, list all the possible configs. There several situation we need to tell it apart(splited spaces): leve11 + level2 level1 + level3 level1 + level3 level1

Contents
  • Module Contents
    • Classes
    • Functions
      • l2_metric()
      • l1_metric()
      • cos_metric()
      • psnr_metric()
      • ssim_metric()
      • buildin_eval_func()
      • AssembleIdxs
        • AssembleIdxs.search_forward()
        • AssembleIdxs.run()
      • SearchSpace
        • SearchSpace.three_level_spaces()

Last updated on Jul 18, 2025.

  • Terms and Conditions
  • Quark Licenses and Disclaimers
  • Privacy
  • Trademarks
  • Statement on Forced Labor
  • Fair and Open Competition
  • UK Tax Strategy
  • Cookie Policy
  • Cookie Settings
© 2024 Advanced Micro Devices, Inc