quark.torch.quantization.tensor_quantize
#
Module Contents#
Classes#
- class quark.torch.quantization.tensor_quantize.FakeQuantizeBase(quant_spec: quark.torch.quantization.config.config.QuantizationSpec, device: torch.device | None = None)#
Base fake quantize module.
Base fake quantize module Any fake quantize implementation should derive from this class.
Concrete fake quantize module should follow the same API. In forward, they will update the statistics of the observed Tensor and fake quantize the input. They should also provide a calculate_qparams function that computes the quantization parameters given the collected statistics.
- update_buffer(buffer_name: str, new_value: torch.Tensor | None, input_tensor_device: torch.device) None #
Update the value of a registered buffer while ensuring that its shape, device, and data type match the input tensor.
Parameters: - buffer_name: The name of the buffer to update - new_value: The new value to assign to the buffer - input_tensor_device: The target device (e.g., torch.device(‘cuda’) or torch.device(‘cpu’))
- class quark.torch.quantization.tensor_quantize.ScaledFakeQuantize(quant_spec: quark.torch.quantization.config.config.QuantizationSpec, device: torch.device | None = None, **kwargs: Any)#
Base fake quantize module.
Base fake quantize module Any fake quantize implementation should derive from this class.
Concrete fake quantize module should follow the same API. In forward, they will update the statistics of the observed Tensor and fake quantize the input. They should also provide a calculate_qparams function that computes the quantization parameters given the collected statistics.
- class quark.torch.quantization.tensor_quantize.NonScaledFakeQuantize(quant_spec: quark.torch.quantization.config.config.QuantizationSpec, device: torch.device | None = None)#
Base fake quantize module.
Base fake quantize module Any fake quantize implementation should derive from this class.
Concrete fake quantize module should follow the same API. In forward, they will update the statistics of the observed Tensor and fake quantize the input. They should also provide a calculate_qparams function that computes the quantization parameters given the collected statistics.