quark.torch.quantization.config.type#

Module Contents#

Classes#

QSchemeType

The quantization schemes applicable to tensors within a model.

ZeroPointType

The zero point Dtype used for zero point.

Dtype

The data types used for quantization of tensors.

ScaleType

The types of scales used in quantization.

RoundType

The rounding methods used during quantization.

DeviceType

The target devices for model deployment and optimization.

QuantizationMode

Different quantization modes.

TQTThresholdInitMeth

The method of threshold initialization of TQT algorithm in QAT. See Table 2 in https://arxiv.org/pdf/1903.08066.pdf

class quark.torch.quantization.config.type.QSchemeType(*args, **kwds)#

The quantization schemes applicable to tensors within a model.

  • per_tensor: Quantization is applied uniformly across the entire tensor.

  • per_channel: Quantization parameters differ across channels of the tensor.

  • per_group: Quantization parameters differ across defined groups of weight tensor elements.

class quark.torch.quantization.config.type.ZeroPointType(*args, **kwds)#

The zero point Dtype used for zero point.

  • ‘int32’: int zero point

  • ‘float32’: float zero point

class quark.torch.quantization.config.type.Dtype(*args, **kwds)#

The data types used for quantization of tensors.

  • int8: Signed 8-bit integer, range from -128 to 127.

  • uint8: Unsigned 8-bit integer, range from 0 to 255.

  • int4: Signed 4-bit integer, range from -8 to 7.

  • uint4: Unsigned 4-bit integer, range from 0 to 15.

  • bfloat16: Bfloat16 format.

  • float16: Standard 16-bit floating point format.

  • fp8_e4m3: FP8 format with 4 exponent bits and 3 bits of mantissa.

  • fp8_e5m2: FP8 format with 5 exponent bits and 2 bits of mantissa.

  • fp6_e3m2: FP6 format with 3 exponent bits and 2 bits of mantissa.

  • fp6_e2m3: FP6 format with 2 exponent bits and 3 bits of mantissa.

  • fp4: FP4 format.

  • mx: MX format 8 bit shared exponent with specific element data types.

  • mx6, mx9: Block data representation with multi-level ultra-fine scaling factors.

class quark.torch.quantization.config.type.ScaleType(*args, **kwds)#

The types of scales used in quantization.

  • float: Scale values are floating-point numbers. They use the same floating point dtype as the original model dtype.

  • pof2: Scale values are powers of two.

  • float32: Scale values are float32 numbers.

  • float16: Scale values are float16 numbers.

  • bfloat16: Scale values are bfloat16 numbers.

class quark.torch.quantization.config.type.RoundType(*args, **kwds)#

The rounding methods used during quantization.

  • round: Rounds.

  • floor: Floors towards the nearest even number.

  • half_even: Rounds towards the nearest even number.

class quark.torch.quantization.config.type.DeviceType(*args, **kwds)#

The target devices for model deployment and optimization.

  • CPU: CPU.

  • IPU: IPU.

class quark.torch.quantization.config.type.QuantizationMode(*args, **kwds)#

Different quantization modes.

  • eager_mode: The eager mode based on PyTorch in-place operator replacement.

  • fx_graph_mode: The graph mode based on torch.fx.

class quark.torch.quantization.config.type.TQTThresholdInitMeth(*args, **kwds)#

The method of threshold initialization of TQT algorithm in QAT. See Table 2 in https://arxiv.org/pdf/1903.08066.pdf

  • _3SD: The method of threshold initialization with std and 3 as hyperparameters.

  • _LL_J: The method of threshold initialization in the Algorithm 1 of paper “Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines” - Sean Settle et al. https://arxiv.org/pdf/1805.07941.pdf