Save & Load Quantized Models#
Saving#
Save the network architecture or configurations and parameters of the quantized model.
Support both eager and fx-graph model quantization.
For eager mode quantization, the model’s configurations are stored in json file, and parameters including weight, bias, scale, and zero_point are stored in safetensors file.
For fx_graph mode quantization, the model’s network architecture and parameters are stored in pth file.
Example of Saving in Eager Mode#
from quark.torch import save_params
save_params(model, model_type=model_type, export_dir="./save_dir")
Example of Saving in FX-graph Mode#
from quark.torch.export.api import save_params
save_params(model,
model_type=model_type,
args=example_inputs,
export_dir="./save_dir",
quant_mode=QuantizationMode.fx_graph_mode)
Loading#
Instantiate a quantized model from saved model files, which is generated using the above saving function.
Support both eager and fx-graph model quantization.
Only support weight-only and static quantization for now.
Example of Loading in Eager Mode#
from quark.torch import load_params
model = load_params(model, json_path=json_path, safetensors_path=safetensors_path)
Example of Loading in FX-graph Mode#
from quark.torch.quantization.api import load_params
model = load_params(pth_path=model_file_path, quant_mode=QuantizationMode.fx_graph_mode)