Quark Format#

Quark Format Exporting#

Note

For most use cases with external open-source libraries (Transformers, vLLM, etc.), the serialization format described on this page should not be used, and you should refer to the safetensors format instead.

Quark format is a proprietary export format for Quark, and the file list of this exporting format contains the quantized parameters (in a model_state_dict.pth file) such as weight, scale, and zero point and config.json with quantization configuration.

Note that this model currently only supports exporting linear parts (which is sufficient for general large language modeling) For other needs using quark export (e.g., exporting embedding layers, convolutional layers), use Saving & Loading below. In fact, we are gradually migrating the save and load functionality to ModelExporter in quark format.

Example of Quark Format Exporting#

from quark.torch.export.config.config import ExporterConfig, JsonExporterConfig
from quark.torch.export.api import ModelExporter

json_export_config = JsonExporterConfig(
   weight_format="real_quantized",
   pack_method="reorder"
)
export_config = ExporterConfig(json_export_config=json_export_config)

exporter = ModelExporter(config=export_config, export_dir="./exported_model_dir")
exporter.export_quark_model(model, quant_config=quant_config)

By default, ModelExporter.export_quark_model exports models using a Quark-specific format for the checkpoint and quantization_config format in the config.json file.

This format may not directly be usable by some downstream libraries (vLLM) until downstream libraries support Quark quantized models. But it can be loaded and used by quark itself.

This format supports two forms of weight saving, fake quantized will save the high precision weight after quantization , while real quantized will save the weights after the real quantization. You can configure this with weight_format.

from quark.torch.export.config.config import ExporterConfig, JsonExporterConfig
from quark.torch.export.api import ModelExporter

json_export_config = JsonExporterConfig(weight_format="real_quantized", pack_method="reorder")
export_config = ExporterConfig(json_export_config=json_export_config)

exporter = ModelExporter(config=export_config, export_dir=args.output_dir)
exporter.export_quark_model(model, quant_config=quant_config, custom_mode=args.custom_mode)

Quark Format Importing#

Models exported using quark format can be imported directly using quark. Models exported using quark format can be imported directly using quark. quark chooses how to load the weights based on the information in the config.

Example of Quark Format Importing#

from quark.torch import ModelImporter

importer = ModelImporter(model_info_dir=args.import_model_dir)
model = importer.import_model_info(model)