quark.torch.export.nn.modules.realquantizer#

Module Contents#

Classes#

RealQuantizerBase

Helper class that provides a standard way to create an ABC using

ScaledRealQuantizer

On export, performs transpose on scale and pack on zeropint. Called by parent class, performs real quantization on weight, bias.

NonScaledRealQuantizer

On export, performs transpose on scale and pack on zeropint. Called by parent class, performs real quantization on weight, bias.

class quark.torch.export.nn.modules.realquantizer.RealQuantizerBase#

Helper class that provides a standard way to create an ABC using inheritance.

class quark.torch.export.nn.modules.realquantizer.ScaledRealQuantizer(qspec: quark.torch.quantization.config.config.QuantizationSpec, quantizer: quark.torch.quantization.tensor_quantize.FakeQuantizeBase | None, reorder: bool, real_quantized: bool, float_dtype: torch.dtype, device: torch.device | None = torch.device('cuda'), scale_shape: Tuple[int, Ellipsis] | None = None, zero_point_shape: Tuple[int, Ellipsis] | None = None)#

On export, performs transpose on scale and pack on zeropint. Called by parent class, performs real quantization on weight, bias. On import, performs dequantization of weight, bias, and fakequantization of input, output via forward method.

to_real_quantize_params(param: torch.Tensor) torch.Tensor#

Quantize weight and bias on low-bit precision datatypes, and pack them if required.

class quark.torch.export.nn.modules.realquantizer.NonScaledRealQuantizer(qspec: quark.torch.quantization.config.config.QuantizationSpec, quantizer: quark.torch.quantization.tensor_quantize.FakeQuantizeBase | None, reorder: bool, real_quantized: bool, float_dtype: torch.dtype, device: torch.device | None = torch.device('cuda'), scale_shape: Tuple[int, Ellipsis] | None = None, zero_point_shape: Tuple[int, Ellipsis] | None = None)#

On export, performs transpose on scale and pack on zeropint. Called by parent class, performs real quantization on weight, bias. On import, performs dequantization of weight, bias, and fakequantization of input, output via forward method.

to_real_quantize_params(param: torch.Tensor) torch.Tensor#

Quantize weight and bias on low-bit precision datatypes, and pack them if required.