Language Model QAT Using Quark#
This document provides examples of Quantization-Aware Training (QAT) for language models using Quark.
Note
For information on accessing Quark PyTorch examples, refer to Accessing PyTorch Examples.
This example and the relevant files are available at /torch/language_modeling/llm_qat
.
Supported Models#
Model Name |
WEIGHT-ONLY (INT4.g128) |
---|---|
microsoft/Phi-3-mini-4k-instruct |
✓ |
THUDM/chatglm3-6b |
✓ |
Preparation#
(Optional) For LLM models, download the Hugging Face checkpoint.
QAT Scripts#
You can run the following Python scripts in the examples/torch/language_modeling/llm_qat
path. Here, ChatGLM3-6B is used as an example.
Note
The ChatGLM3-6B model may encounter some tokenizer-related issues. To resolve this, please install
transformers==4.44.0
.When performing full fine-tuning on large language models, it is crucial to carefully select an appropriate fine-tuning dataset for optimal results.
Recipe 1: Evaluation of Original LLM#
finetune_checkpoint="./finetune_checkpoint/Chatglm3-6B"
mkdir -p $finetune_checkpoint
CUDA_VISIBLE_DEVICES=0 python main.py \
--model THUDM/chatglm3-6b \
--model_trust_remote_code \
--skip_quantization \
--skip_finetune \
--eval_task openllm | tee $finetune_checkpoint/test_bf16.log
Recipe 2: QAT Finetuning and Export to Safetensors#
output_dir="./quantized_model/Chatglm3-6B-u4w-ft"
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py \
--model THUDM/chatglm3-6b \
--model_trust_remote_code \
--quant_scheme w_uint4_asym \
--group_size 128 \
--finetune_dataset wikitext \
--finetune_datasubset wikitext-2-raw-v1 \
--finetune_epoch 10 \
--finetune_lr 2e-5 \
--finetune_iter 500 \
--finetune_seqlen 512 \
--finetune_batchsize 8 \
--finetune_checkpoint $finetune_checkpoint \
--model_export \
--output_dir $output_dir \
--skip_evaluation | tee $finetune_checkpoint/finetune_w_uint4_asym.log
Recipe 3: Reload and Evaluate Finetuned Model#
CUDA_VISIBLE_DEVICES=0 python main.py \
--model THUDM/chatglm3-6b \
--model_trust_remote_code \
--quant_scheme w_uint4_asym \
--group_size 128 \
--skip_finetune \
--safetensors_model_reload \
--safetensors_model_dir $output_dir \
--eval_task openllm | tee $finetune_checkpoint/test_w_uint4_asym_finetuned.log