quark.torch.quantization.observer.observer#

Module Contents#

Classes#

class quark.torch.quantization.observer.observer.ObserverBase(qspec: quark.torch.quantization.config.config.QuantizationSpec, device: Optional[torch.device] = None)#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.observer.observer.PlaceholderObserver(qspec: quark.torch.quantization.config.config.QuantizationSpec, device: Optional[torch.device] = None)#

Observer only passes its configuration to the quantized module’s .from_float().

Does not have any calculation.

Only can be used for quantization to float16 and bfloat16 which doesn’t require determining ranges.

class quark.torch.quantization.observer.observer.UniformScalingObserver(qspec: quark.torch.quantization.config.config.QuantizationSpec, device: Optional[torch.device] = None, eps: float = torch.finfo(torch.float32).eps)#

Observer for uniform scaling quantizer. For example ‘int uniform quantizer’ or ‘fp8 uniform scaling’.

calculate_qparams(min_val: torch.Tensor, max_val: torch.Tensor) Tuple[torch.Tensor, torch.Tensor]#

Calculates the quantization parameters.

reset_min_max_vals() None#

Resets the min/max values.

class quark.torch.quantization.observer.observer.PerTensorMinMaxObserver(qspec: quark.torch.quantization.config.config.QuantizationSpec, device: Optional[torch.device] = None)#

Observer for uniform scaling quantizer. For example ‘int uniform quantizer’ or ‘fp8 uniform scaling’.

forward(x_orig: torch.Tensor) torch.Tensor#

Records the running minimum and maximum of x.

class quark.torch.quantization.observer.observer.PerChannelMinMaxObserver(qspec: quark.torch.quantization.config.config.QuantizationSpec, device: Optional[torch.device] = None, eps: float = torch.finfo(torch.float32).eps)#

Observer for uniform scaling quantizer. For example ‘int uniform quantizer’ or ‘fp8 uniform scaling’.

class quark.torch.quantization.observer.observer.PerBlockMXObserver(qspec: quark.torch.quantization.config.config.QuantizationSpec, device: Optional[torch.device] = None, eps: float = torch.finfo(torch.float32).eps)#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.observer.observer.PerBlockBFPObserver(qspec: quark.torch.quantization.config.config.QuantizationSpec, device: Optional[torch.device] = None, eps: float = torch.finfo(torch.float32).eps)#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.quantization.observer.observer.PerGroupMinMaxObserver(qspec: quark.torch.quantization.config.config.QuantizationSpec, device: Optional[torch.device] = None, eps: float = torch.finfo(torch.float32).eps)#

Observer for uniform scaling quantizer. For example ‘int uniform quantizer’ or ‘fp8 uniform scaling’.

calculate_qparams(min_val: torch.Tensor, max_val: torch.Tensor) Tuple[torch.Tensor, torch.Tensor]#

Calculates the quantization parameters.

class quark.torch.quantization.observer.observer.PerTensorHistogramObserver(qspec: quark.torch.quantization.config.config.QuantizationSpec, device: Optional[torch.device] = None)#

Observer for uniform scaling quantizer. For example ‘int uniform quantizer’ or ‘fp8 uniform scaling’.

forward(x_orig: torch.Tensor) torch.Tensor#

Records the running histogram of x_orig.

Raises: - ValueError: If the self.symmetric argument is False.

class quark.torch.quantization.observer.observer.PerTensorPercentileObserver(qspec: quark.torch.quantization.config.config.QuantizationSpec, device: Optional[torch.device] = None)#

Observer for uniform scaling quantizer. For example ‘int uniform quantizer’ or ‘fp8 uniform scaling’.

get_min_max_by_percentile(histogram: torch.Tensor, bin_edges: torch.Tensor, percentile: float) Tuple[torch.Tensor, torch.Tensor]#

Calculate the minimum and maximum values of a histogram at a specified percentile.

Parameters: - histogram (torch.Tensor): A tensor representing the histogram of the data. Each element in the histogram represents the frequency of data in the corresponding bin. - bin_edges (torch.Tensor): A tensor containing the edge values that correspond to the bins represented in the histogram. There should be one more element in bin_edges than in histogram. - percentile (int): The percentile at which to determine the minimum and maximum values. The value should be an integer between 0 and 100.

Returns: - Tuple[torch.Tensor, torch.Tensor]: A tuple containing two tensors. The first tensor is the value at the specified percentile, and the second tensor is the value at the complementary percentile (i.e., 100-percentile).

Raises: - ValueError: If the percentile argument is not within the range 0 to 100.

class quark.torch.quantization.observer.observer.PerTensorMSEObserver(qspec: quark.torch.quantization.config.config.QuantizationSpec, device: Optional[torch.device] = None)#

Observer for uniform scaling quantizer. For example ‘int uniform quantizer’ or ‘fp8 uniform scaling’.

get_min_max_by_mse(calib_hist: torch.Tensor, calib_bin_edges: torch.Tensor, stride: int = 1, start_bin: int = 2045) Tuple[torch.Tensor, torch.Tensor]#

Returns amax that minimizes MSE of the collected histogram.