quark.torch.quantization.config.config#

Module Contents#

Classes#

Functions#

class quark.torch.quantization.config.config.ConfigBase#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Config#

A class that encapsulates comprehensive quantization configurations for a machine learning model, allowing for detailed and hierarchical control over quantization parameters across different model components.

Parameters:
  • global_quant_config (QuantizationConfig) – Global quantization configuration applied to the entire model unless overridden at the layer level.

  • layer_type_quant_config (Dict[str, QuantizationConfig]) – A dictionary mapping from layer types (e.g., nn.Conv2d, nn.Linear) to their quantization configurations.

  • layer_quant_config (Dict[str, QuantizationConfig]) – A dictionary mapping from layer names to their quantization configurations, allowing for per-layer customization. Default is an empty dictionary.

  • exclude (List[str]) – A list of layer names to be excluded from quantization, enabling selective quantization of the model. Default is an empty list.

  • algo_config (Optional[AlgoConfig]) – Optional configuration for the quantization algorithm, such as GPTQ and AWQ. After this process, the datatype/fake_datatype of weights will be changed with quantization scales. Default is None.

  • quant_mode (QuantizationMode) – The quantization mode to be used (eager_mode or fx_graph_mode). Default is eager_mode.

  • pre_quant_opt_config (List[PreQuantOptConfig]) – Optional pre-processing optimization, such as Equalization and SmoothQuant. After this process, the value of weights will be changed, but the dtype/fake_dtype will be the same. Default is an empty list.

  • log_severity_level (Optional[int]) – 0:DEBUG, 1:INFO, 2:WARNING. 3:ERROR, 4:CRITICAL/FATAL. Default is 1.

set_algo_config(algo_config: AlgoConfig | None) None#

Sets the algorithm configuration for quantization.

Parameters:

algo_config (Optional[AlgoConfig]) – The quantization algorithm configuration to be set.

add_pre_optimization_config(pre_quant_opt_config: PreQuantOptConfig) None#

Adds a pre-processing optimization configuration to the list of existing pre-quant optimization configs.

Parameters:

pre_quant_opt_config (PreQuantOptConfig) – The pre-quantization optimization configuration to add.

class quark.torch.quantization.config.config.QuantizationConfig#

A data class that specifies quantization configurations for different components of a module, allowing hierarchical control over how each tensor type is quantized.

Parameters:
  • input_tensors (Optional[QuantizationSpec]) – Input tensors quantization specification. If None, following the hierarchical quantization setup. e.g. If the input_tensors in layer_type_quant_config is None, the configuration from global_quant_config will be used instead. Defaults to None. If None in global_quant_config, input_tensors are not quantized.

  • output_tensors (Optional[QuantizationSpec]) – Output tensors quantization specification. Defaults to None. If None, the same as above.

  • weight (Optional[QuantizationSpec]) – The weights tensors quantization specification. Defaults to None. If None, the same as above.

  • bias (Optional[QuantizationSpec]) – The bias tensors quantization specification. Defaults to None. If None, the same as above.

  • target_device (Optional[DeviceType]) – Configuration specifying the target device (e.g., CPU, GPU, IPU) for the quantized model.

class quark.torch.quantization.config.config.DataTypeSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Uint4PerTensorSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Uint4PerChannelSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Uint4PerGroupSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Int4PerTensorSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Int4PerChannelSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Int4PerGroupSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Uint8PerTensorSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Uint8PerChannelSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Uint8PerGroupSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Int8PerTensorSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Int8PerChannelSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Int8PerGroupSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.FP8E4M3PerTensorSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.FP8E4M3PerChannelSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.FP8E4M3PerGroupSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.FP8E5M2PerTensorSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.FP8E5M2PerChannelSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.FP8E5M2PerGroupSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Float16Spec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Bfloat16Spec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.MXSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.MX6Spec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.MX9Spec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.BFP16Spec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.QuantizationSpec#

A data class that defines the specifications for quantizing tensors within a model.

Parameters:
  • dtype (Dtype) – The data type for quantization (e.g., int8, int4).

  • is_dynamic (Optional[bool]) – Specifies whether dynamic or static quantization should be used. Default is None, which indicates no specification.

  • observer_cls (Optional[Type[ObserverBase]]) – The class of observer to be used for determining quantization parameters like min/max values. Default is None.

  • qscheme (Optional[QSchemeType]) – The quantization scheme to use, such as per_tensor, per_channel or per_group. Default is None.

  • ch_axis (Optional[int]) – The channel axis for per-channel quantization. Default is None.

  • group_size (Optional[int]) – The size of the group for per-group quantization, also the block size for MX datatypes. Default is None.

  • symmetric (Optional[bool]) – Indicates if the quantization should be symmetric around zero. If True, quantization is symmetric. If None, it defers to a higher-level or global setting. Default is None.

  • round_method (Optional[RoundType]) – The rounding method during quantization, such as half_even. If None, it defers to a higher-level or default method. Default is None.

  • scale_type (Optional[ScaleType]) – Defines the scale type to be used for quantization, like power of two or float. If None, it defers to a higher-level setting or uses a default method. Default is None.

  • mx_element_dtype (Optional[Dtype]) – Defines the data type to be used for the element type when using mx datatypes, the shared scale effectively uses FP8 E8M0.

class quark.torch.quantization.config.config.TQTSpec#

Helper class that provides a standard way to create an ABC using inheritance.

quark.torch.quantization.config.config.load_pre_optimization_config_from_file(file_path: str) PreQuantOptConfig#

Load pre-optimization configuration from a JSON file.

Parameters:

file_path (str) – The path to the JSON file containing the pre-optimization configuration.

Returns:

The pre-optimization configuration.

Return type:

PreQuantOptConfig

quark.torch.quantization.config.config.load_quant_algo_config_from_file(file_path: str) AlgoConfig#

Load quantization algorithm configuration from a JSON file.

Parameters:

file_path (str) – The path to the JSON file containing the quantization algorithm configuration.

Returns:

The quantization algorithm configuration.

Return type:

AlgoConfig

class quark.torch.quantization.config.config.AlgoConfigBase#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.PreQuantOptConfig#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.AlgoConfig#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.SmoothQuantConfig#

A data class that defines the specifications for Smooth Quantization.

Parameters:
  • name (str) – The name of the configuration, typically used to identify different quantization settings. Default is “smoothquant”.

  • alpha (int) – The factor of adjustment in the quantization formula, influencing how aggressively weights are quantized. Default is 1.

  • scale_clamp_min (float) – The minimum scaling factor to be used during quantization, preventing the scale from becoming too small. Default is 1e-3.

  • scaling_layers (List[Dict[str, str]]) – Specific settings for scaling layers, allowing customization of quantization parameters for different layers within the model. Default is None.

  • model_decoder_layers (str) – Specifies any particular decoder layers in the model that might have unique quantization requirements. Default is None.

class quark.torch.quantization.config.config.RotationConfig#

A data class that defines the specifications for rotation settings in processing algorithms.

Parameters:
  • name (str) – The name of the configuration, typically used to identify different rotation settings. Default is “rotation”.

  • random (bool) – A boolean flag indicating whether the rotation should be applied randomly. This can be useful for data augmentation purposes where random rotations may be required. Default is False.

  • scaling_layers (List[Dict[str, str]]) – Specific settings for scaling layers, allowing customization of quantization parameters for different layers within the model. Default is None.

class quark.torch.quantization.config.config.AutoSmoothQuantConfig#

A data class that defines the specifications for Smooth Quantization.

Parameters:
  • name (str) – The name of the configuration, typically used to identify different quantization settings. Default is “smoothquant”.

  • auto_alpha (bool) – Whether to automatically search for hyperparameters alpha. Default is False.

  • scale_clamp_min (float) – The minimum scaling factor to be used during quantization, preventing the scale from becoming too small. Default is 1e-3.

  • scaling_layers (Optional[List[Dict[str, str]]]) – Specific settings for scaling layers, allowing customization of quantization parameters for different layers within the model. Default is None.

  • embedding_layers (Optional[List[str]]) – A list of embedding layer names that require special quantization handling to maintain their performance and accuracy. Default is None.

  • model_decoder_layers (Optional[str]) – Specifies any particular decoder layers in the model that might have unique quantization requirements. Default is None.

class quark.torch.quantization.config.config.AWQConfig#

Configuration for Activation-aware Weight Quantization (AWQ).

Parameters:
  • name (str) – The name of the quantization configuration. Default is “awq”.

  • scaling_layers (List[Dict[str, str]]) – Configuration details for scaling layers within the model, specifying custom scaling parameters per layer. Default is None.

  • model_decoder_layers (str) – Specifies the layers involved in model decoding that may require different quantization parameters. Default is None.

class quark.torch.quantization.config.config.GPTQConfig#

A data class that defines the specifications for Accurate Post-Training Quantization for Generative Pre-trained Transformers (GPTQ).

Parameters:
  • name (str) – The configuration name. Default is “gptq”.

  • damp_percent (float) – The percentage used to dampen the quantization effect, aiding in the maintenance of accuracy post-quantization. Default is 0.01.

  • desc_act (bool) – Indicates whether descending activation is used, typically to enhance model performance with quantization. Default is True.

  • static_groups (bool) – Specifies whether the order of groups for quantization are static or can be dynamically adjusted. Default is True. Quark export only support static_groups as True.

  • true_sequential (bool) – Indicates whether the quantization should be applied in a truly sequential manner across the layers. Default is True.

  • inside_layer_modules (List[str]) – Lists the names of internal layer modules within the model that require specific quantization handling. Default is None.

  • model_decoder_layers (str) – Specifies custom settings for quantization on specific decoder layers of the model. Default is None.