quark.torch.quantization.config.config#

Module Contents#

Classes#

Functions#

class quark.torch.quantization.config.config.ConfigBase#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Config#

A class that encapsulates comprehensive quantization configurations for a machine learning model, allowing for detailed and hierarchical control over quantization parameters across different model components.

Parameters:
  • global_quant_config (QuantizationConfig) – Global quantization configuration applied to the entire model unless overridden at the layer level.

  • layer_type_quant_config (Dict[str, QuantizationConfig]) – A dictionary mapping from layer types (e.g., nn.Conv2d, nn.Linear) to their quantization configurations.

  • layer_quant_config (Dict[str, QuantizationConfig]) – A dictionary mapping from layer names to their quantization configurations, allowing for per-layer customization. Default is an empty dictionary.

  • exclude (List[str]) – A list of layer names to be excluded from quantization, enabling selective quantization of the model. Default is an empty list.

  • algo_config (Optional[AlgoConfig]) – Optional configuration for the quantization algorithm, such as GPTQ and AWQ. After this process, the datatype/fake_datatype of weights will be changed with quantization scales. Default is None.

  • quant_mode (QuantizationMode) – The quantization mode to be used (eager_mode or fx_graph_mode). Default is eager_mode.

  • pre_quant_opt_config (List[PreQuantOptConfig]) – Optional pre-processing optimization, such as Equalization and SmoothQuant. After this process, the value of weights will be changed, but the dtype/fake_dtype will be the same. Default is an empty list.

  • log_severity_level (Optional[int]) – 0:DEBUG, 1:INFO, 2:WARNING. 3:ERROR, 4:CRITICAL/FATAL. Default is 1.

set_algo_config(algo_config: AlgoConfig | None) None#

Sets the algorithm configuration for quantization.

Parameters:

algo_config (Optional[AlgoConfig]) – The quantization algorithm configuration to be set.

add_pre_optimization_config(pre_quant_opt_config: PreQuantOptConfig) None#

Adds a pre-processing optimization configuration to the list of existing pre-quant optimization configs.

Parameters:

pre_quant_opt_config (PreQuantOptConfig) – The pre-quantization optimization configuration to add.

class quark.torch.quantization.config.config.QuantizationConfig#

A data class that specifies quantization configurations for different components of a module, allowing hierarchical control over how each tensor type is quantized.

Parameters:
  • input_tensors (Optional[QuantizationSpec]) – Input tensors quantization specification. If None, following the hierarchical quantization setup. e.g. If the input_tensors in layer_type_quant_config is None, the configuration from global_quant_config will be used instead. Defaults to None. If None in global_quant_config, input_tensors are not quantized.

  • output_tensors (Optional[QuantizationSpec]) – Output tensors quantization specification. Defaults to None. If None, the same as above.

  • weight (Optional[QuantizationSpec]) – The weights tensors quantization specification. Defaults to None. If None, the same as above.

  • bias (Optional[QuantizationSpec]) – The bias tensors quantization specification. Defaults to None. If None, the same as above.

  • target_device (Optional[DeviceType]) – Configuration specifying the target device (e.g., CPU, GPU, IPU) for the quantized model.

class quark.torch.quantization.config.config.DataTypeSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Uint4PerTensorSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Uint4PerChannelSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Uint4PerGroupSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Int4PerTensorSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Int4PerChannelSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Int4PerGroupSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Uint8PerTensorSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Uint8PerChannelSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Uint8PerGroupSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Int8PerTensorSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Int8PerChannelSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Int8PerGroupSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.FP8E4M3PerTensorSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.FP8E4M3PerChannelSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.FP8E4M3PerGroupSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.FP8E5M2PerTensorSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.FP8E5M2PerChannelSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.FP8E5M2PerGroupSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Float16Spec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.Bfloat16Spec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.MXSpec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.MX6Spec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.MX9Spec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.BFP16Spec#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.QuantizationSpec#

A data class that defines the specifications for quantizing tensors within a model.

Parameters:
  • dtype (Dtype) – The data type for quantization (e.g., int8, int4).

  • is_dynamic (Optional[bool]) – Specifies whether dynamic or static quantization should be used. Default is None, which indicates no specification.

  • observer_cls (Optional[Type[ObserverBase]]) – The class of observer to be used for determining quantization parameters like min/max values. Default is None.

  • qscheme (Optional[QSchemeType]) – The quantization scheme to use, such as per_tensor, per_channel or per_group. Default is None.

  • ch_axis (Optional[int]) – The channel axis for per-channel quantization. Default is None.

  • group_size (Optional[int]) – The size of the group for per-group quantization, also the block size for MX datatypes. Default is None.

  • symmetric (Optional[bool]) – Indicates if the quantization should be symmetric around zero. If True, quantization is symmetric. If None, it defers to a higher-level or global setting. Default is None.

  • round_method (Optional[RoundType]) – The rounding method during quantization, such as half_even. If None, it defers to a higher-level or default method. Default is None.

  • scale_type (Optional[ScaleType]) – Defines the scale type to be used for quantization, like power of two or float. If None, it defers to a higher-level setting or uses a default method. Default is None.

  • mx_element_dtype (Optional[Dtype]) – Defines the data type to be used for the element type when using mx datatypes, the shared scale effectively uses FP8 E8M0.

class quark.torch.quantization.config.config.TQTSpec#

Helper class that provides a standard way to create an ABC using inheritance.

quark.torch.quantization.config.config.load_pre_optimization_config_from_file(file_path: str) PreQuantOptConfig#

Load pre-optimization configuration from a JSON file.

Parameters:

file_path (str) – The path to the JSON file containing the pre-optimization configuration.

Returns:

The pre-optimization configuration.

Return type:

PreQuantOptConfig

quark.torch.quantization.config.config.load_quant_algo_config_from_file(file_path: str) AlgoConfig#

Load quantization algorithm configuration from a JSON file.

Parameters:

file_path (str) – The path to the JSON file containing the quantization algorithm configuration.

Returns:

The quantization algorithm configuration.

Return type:

AlgoConfig

class quark.torch.quantization.config.config.AlgoConfigBase#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.PreQuantOptConfig#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.AlgoConfig#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.config.config.SmoothQuantConfig#

A data class that defines the specifications for Smooth Quantization.

Parameters:
  • name (str) – The name of the configuration, typically used to identify different quantization settings. Default is “smoothquant”.

  • alpha (int) – The factor of adjustment in the quantization formula, influencing how aggressively weights are quantized. Default is 1.

  • scale_clamp_min (float) – The minimum scaling factor to be used during quantization, preventing the scale from becoming too small. Default is 1e-3.

  • scaling_layers (List[Dict[str, str]]) – Specific settings for scaling layers, allowing customization of quantization parameters for different layers within the model. Default is None.

  • model_decoder_layers (str) – Specifies any particular decoder layers in the model that might have unique quantization requirements. Default is None.

class quark.torch.quantization.config.config.RotationConfig#

A data class that defines the specifications for rotation settings in processing algorithms.

Parameters:
  • name (str) – The name of the configuration, typically used to identify different rotation settings. Default is “rotation”.

  • random (bool) – A boolean flag indicating whether the rotation should be applied randomly. This can be useful for data augmentation purposes where random rotations may be required. Default is False.

  • scaling_layers (List[Dict[str, str]]) – Specific settings for scaling layers, allowing customization of quantization parameters for different layers within the model. Default is None.

class quark.torch.quantization.config.config.QuaRotConfig#

A data class that defines the specifications for the QuaRot algorithm. :param str name: The name of the configuration, typically used to identify different rotation settings. Default is “quarot”. :param bool random: A boolean flag indicating whether R1 should be applied randomly. This can be useful for data augmentation purposes where random rotations may be required. Default is False. :param bool random2: A boolean flag indicating whether R2 should be applied randomly. This can be useful for data augmentation purposes where random rotations may be required. Default is False.

random and random2 are only relevant if we are using Hadamard rotations for R1 and R2. If optimized_rotation_path specified, then we will load R1 and R2 matrices from a file instad of using Hadamard matrices.

Parameters:
  • scaling_layers (List[Dict[str, str]]) – Specific settings for scaling layers, allowing customization of quantization parameters for different layers within the model. Default is None.

  • had (bool) – A boolean flag indicating whether online hadamard operations R3 and R4 should be performed.

  • optimized_rotation_path (Optional[str]) – The path to the file ‘R.bin’ that has saved optimized R1 and (per decoder) R2 matrices. If this is specified, R1 and R2 rotations will be loaded from this file. Otherwise they will be Hadamard matrices.

  • kv_cache_quant (bool) – A boolean flag indicating whether there is kv-cache quantization. R3 rotation is applied only if there is.

  • act_quant (bool) – A boolean flag indicating whether there is kv-cache quantization. R3 rotation is applied only if there is.

  • backbone (str) – A string indicating the path to the model backbone.

  • model_decoder_layers (str) – A string indicating the path to the list of decoder layers.

  • v_proj (str) – A string indicating the path to the v projection layer, starting from the decoder layer it is in.

  • o_proj (str) – A string indicating the path to the o projection layer, starting from the decoder layer it is in.

  • self_attn (str) – A string indicating the path to the self attention block, starting from the decoder layer it is in.

  • mlp (str) – A string indicating the path to the multilayer perceptron layer, starting from the decoder layer it is in.

class quark.torch.quantization.config.config.AutoSmoothQuantConfig#

A data class that defines the specifications for Smooth Quantization.

Parameters:
  • name (str) – The name of the configuration, typically used to identify different quantization settings. Default is “smoothquant”.

  • auto_alpha (bool) – Whether to automatically search for hyperparameters alpha. Default is False.

  • scale_clamp_min (float) – The minimum scaling factor to be used during quantization, preventing the scale from becoming too small. Default is 1e-3.

  • scaling_layers (Optional[List[Dict[str, str]]]) – Specific settings for scaling layers, allowing customization of quantization parameters for different layers within the model. Default is None.

  • embedding_layers (Optional[List[str]]) – A list of embedding layer names that require special quantization handling to maintain their performance and accuracy. Default is None.

  • model_decoder_layers (Optional[str]) – Specifies any particular decoder layers in the model that might have unique quantization requirements. Default is None.

class quark.torch.quantization.config.config.AWQConfig#

Configuration for Activation-aware Weight Quantization (AWQ).

Parameters:
  • name (str) – The name of the quantization configuration. Default is “awq”.

  • scaling_layers (List[Dict[str, str]]) – Configuration details for scaling layers within the model, specifying custom scaling parameters per layer. Default is None.

  • model_decoder_layers (str) – Specifies the layers involved in model decoding that may require different quantization parameters. Default is None.

class quark.torch.quantization.config.config.GPTQConfig#

A data class that defines the specifications for Accurate Post-Training Quantization for Generative Pre-trained Transformers (GPTQ).

Parameters:
  • name (str) – The configuration name. Default is “gptq”.

  • damp_percent (float) – The percentage used to dampen the quantization effect, aiding in the maintenance of accuracy post-quantization. Default is 0.01.

  • desc_act (bool) – Indicates whether descending activation is used, typically to enhance model performance with quantization. Default is True.

  • static_groups (bool) – Specifies whether the order of groups for quantization are static or can be dynamically adjusted. Default is True. Quark export only support static_groups as True.

  • true_sequential (bool) – Indicates whether the quantization should be applied in a truly sequential manner across the layers. Default is True.

  • inside_layer_modules (List[str]) – Lists the names of internal layer modules within the model that require specific quantization handling. Default is None.

  • model_decoder_layers (str) – Specifies custom settings for quantization on specific decoder layers of the model. Default is None.