`quark.torch.quantization.config.type`#

Module Contents#

class quark.torch.quantization.config.type.QSchemeType(*args, **kwds)#

The quantization schemes applicable to tensors within a model.

per_tensor: Quantization is applied uniformly across the entire tensor.
per_channel: Quantization parameters differ across channels of the tensor.
per_group: Quantization parameters differ across defined groups of weight tensor elements.

class quark.torch.quantization.config.type.ZeroPointType(*args, **kwds)#

The zero point Dtype used for zero point.

class quark.torch.quantization.config.type.Dtype(*args, **kwds)#

The data types used for quantization of tensors.

int8: Signed 8-bit integer, range from -128 to 127.
uint8: Unsigned 8-bit integer, range from 0 to 255.
int4: Signed 4-bit integer, range from -8 to 7.
uint4: Unsigned 4-bit integer, range from 0 to 15.
bfloat16: Bfloat16 format.
float16: Standard 16-bit floating point format.
fp8_e4m3: FP8 format with 4 exponent bits and 3 bits of mantissa.
fp8_e5m2: FP8 format with 5 exponent bits and 2 bits of mantissa.
fp6_e3m2: FP6 format with 3 exponent bits and 2 bits of mantissa.
fp6_e2m3: FP6 format with 2 exponent bits and 3 bits of mantissa.
fp4: FP4 format.
mx: MX format 8 bit shared exponent with specific element data types.
mx6, mx9: Block data representation with multi-level ultra-fine scaling factors.

class quark.torch.quantization.config.type.ScaleType(*args, **kwds)#

The types of scales used in quantization.

class quark.torch.quantization.config.type.RoundType(*args, **kwds)#

The rounding methods used during quantization.

class quark.torch.quantization.config.type.DeviceType(*args, **kwds)#

The target devices for model deployment and optimization.

class quark.torch.quantization.config.type.QuantizationMode(*args, **kwds)#

Different quantization modes.

class quark.torch.quantization.config.type.TQTThresholdInitMeth(*args, **kwds)#

The method of threshold initialization of TQT algorithm in QAT. See Table 2 in https://arxiv.org/pdf/1903.08066.pdf

_3SD: The method of threshold initialization with std and 3 as hyperparameters.
_LL_J: The method of threshold initialization in the Algorithm 1 of paper “Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines” - Sean Settle et al. https://arxiv.org/pdf/1805.07941.pdf