ONNX model calibration#
- quark.onnx.calibration.interface.run_calibration(model_input: str | Path | ModelProto, data_reader: CalibrationDataReader, op_types_to_calibrate: Sequence[str] | None = None, activation_type: QuantType = QuantType.QInt8, calibrate_method: CalibrationMethod | LayerWiseMethod | PowerOfTwoMethod = CalibrationMethod.MinMax, use_external_data_format: bool = False, execution_providers: list[str] | None = ['CPUExecutionProvider'], quantized_tensor_type: dict[Any, Any] = {}, extra_options: dict[str, Any] = {}) TensorsData [source]#
This is an interface function used for calibration.
- Parameters:
model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
data_reader (CalibrationDataReader) – Data reader for model calibration.
op_types_to_calibrate (Optional[Sequence[str]]) – List of operator types to calibrate. Defaults to
None
, which indicates that all float32/float16 tensors are calibrated.activation_type (QuantType) – The quantization type of activation. Default is QuantType.QInt8.
calibrate_method (Union[CalibrationMethod, LayerWiseMethod, PowerOfTwoMethod]) – Calibration method to use (MinMax, Entropy, Percentile, Distribution, NonOverflow or MinMSE).
use_external_data_format (bool) – Whether to use external data format for large models.
execution_providers (Union[List[str], None]) – List of execution providers for ONNX Runtime.
extra_options (Dict[str, Any]) – Extra options for quantization, which contains additional options for calibrator configuration.
- Returns:
Data range for each quantizing tensor.
- quark.onnx.calibration.interface.fake_calibration(model_input: str | Path | ModelProto) TensorsData [source]#
A calibration function that produces fake tensor range of [0,1], intended for scenarios that don’t need actual calibration, such as block FP quantization, to accelerate the entire process.
- Parameters:
model_input (Union[str, Path, onnx.ModelProto]) – ONNX model to calibrate.
- Returns:
Data range for each quantizing tensor.