quark.onnx.finetuning.create_torch.quant_base_ops#

Module Contents#

Classes#

class quark.onnx.finetuning.create_torch.quant_base_ops.Quantizer(scale: torch.Tensor, zero_point: torch.Tensor, min_q: torch.Tensor, max_q: torch.Tensor, ch_axis: int = 0, q_folded: bool = False)#

Standard Quantizer has three functions including quantize, dequantize and quantize_dequantize, which is corresponding to ONNX QuantizeLinear, DequantizeLinear and Q/DQ pair separately. By default in forward, it works in quantize_dequantize mode.

round_impl(tensor: torch.Tensor) None#

Implement the round function, designed for adaround quantizer

tensor_sync(tensor: torch.Tensor) None#

The Pre-processing of the parameter according to the input tensor

class quark.onnx.finetuning.create_torch.quant_base_ops.AdaroundConstants#

Constants used for Adarounding

class quark.onnx.finetuning.create_torch.quant_base_ops.AdaroundQuantizer(scale: torch.Tensor, zero_point: torch.Tensor, min_q: torch.Tensor, max_q: torch.Tensor, ch_axis: int = 0, q_folded: bool = False)#

AdaRound Quantizer has a alpha paramter for optimizing weight rounding

round_impl(tensor: torch.Tensor) None#

Implement the rounding function for adaround :param weight: The tensor to be ada-rounded

initialize_alpha(tensor: torch.Tensor) None#

Initializes alpha parameter, same shape as the tensor :param tensor: The tensor to be ada-rounded

class quark.onnx.finetuning.create_torch.quant_base_ops.QuantizationModule(quant_info: Tuple[numpy.typing.NDArray[numpy.float32], numpy.typing.NDArray[Any], numpy.typing.NDArray[Any], numpy.typing.NDArray[Any], int, bool] | Dict[str, Any] | None)#

A pytorch module that behaves as ONNX quantization nodes

class quark.onnx.finetuning.create_torch.quant_base_ops.QuantizeWrapper(w_alpha: float = 1.0, b_beta: float = 1.0, **kwargs: Dict[str, Any])#

A wrapper for torch layer’s input/weight/bias quantization