quark.torch.quantization.observer.tqt_observer

quark.torch.quantization.observer.tqt_observer#

Module Contents#

Classes#

class quark.torch.quantization.observer.tqt_observer.TQTObserver(qspec: quark.torch.quantization.config.config.QuantizationSpec, device: torch.device | None = None)#

Observer for uniform scaling quantizer. For example ‘int uniform quantizer’ or ‘fp8 uniform scaling’.

get_fix_position() int#
  1. TQT: qx = clip(round(fx / scale)) * scale, scale = 2^ceil(log2t) / 2^(b-1)

(2) NndctFixNeron: qx = clip(round(fx * scale)) * (1 / scale), scale = 2^fp Let (1) equals (2), we can get (3): 2^(b-1) / 2^ceil(log2t) = 2^fp

=> fp = b - 1 - ceil(log2t)

For more details, see nndct/include/cuda/nndct_fix_kernels.cuh::_fix_neuron_v2_device