Introduction#

This script demonstrates the integration of APL(``AMD Pytorch-light``,internal project name) into Quark.APL is a lightweight model. APL is a lightweight model optimization library based on PyTorch, designed primarily for developers. APL is AMD’s internal quantization framework; external users need to request access. Ensure APL is installed before running this example.

APL supports a variety of quantization methods and advanced quantization data types. Quark provides a user-friendly interface, allowing users to easily leverage these quantization techniques. This example combines the strengths of both frameworks, enabling users to invoke APL through Quark’s interface. We have prepared three examples that demonstrate the use of the BFP16, INTK, and BRECQ quantization schemes of APL via Quark’s interface.

Example 1#

In this example, we use the llama2 model and call APL to perform the int_k model quantization. - model : llama2 7b - calib method : minmax - quant dtype : int8

replace ops: - replace nn.Linear => pytorchlight.nn.linear_layer.LLinear

run script

for inner test model_path=/group/ossmodelzoo/quark_torch/huggingface_pretrained_models/meta-llama/Llama-2-7b and dataset_path= /group/ossdphi_algo_scratch_06/meng/pytorch-light/examples/calib/hf_llm/

model_path={your `llama-2-7b` model path}
dataset_path={your data path}

python quantize_quark.py \
    --model llama-7b \
    --model_path  ${model_path}\
    --seqlen 4096 \
    --dataset_path ${dataset_path} \
    --eval \

Example 2#

In this example, we use the opt-125m model and call APL to perform bfp16 model quantization. We support the quantization of nn.Linear, nn.LayerNorm, and nn.Softmax through APL. - model : opt-125m - calib method : minmax - quant dtype : bfp16

replace ops: - nn.Linear => pytorchlight.nn.linear_layer.LLinear - nn.LayerNorm => pytorchlight.nn.normalization_layer.LLayerNorm - nn.Softmax => pytorchlight.nn.activation_layer.LSoftmax

run script

for inner test model_path=/group/modelzoo/sequence_learning/weights/nlp-pretrained-model/opt_125m_pretrained_pytorch and dataset_path= /group/ossdphi_algo_scratch_06/meng/pytorch-light/examples/calib/hf_llm/

model_path={your `opt-125m` model path}
dataset_path={your data path}

python quantize_quark.py \
    --model opt-125m \
    --model_path ${model_path} \
    --seqlen 4096 \
    --qconfig 0 \
    --eval \
    --qscale_type fp32 \
    --dataset_path ${dataset_path} \
    --example bfp16

Example 3#

This example demonstrates how to use the brecq algorithm through quark call APL

  • model : opt-125m

  • calib method : minmax

  • quant dtype : int8

for inner test model_path=/group/modelzoo/sequence_learning/weights/nlp-pretrained-model/opt_125m_pretrained_pytorchand dataset_path= /group/ossdphi_algo_scratch_06/meng/pytorch-light/examples/calib/hf_llm/

model_path={your `opt-125m` model path}
dataset_path={your data path}

export CUDA_VISIBLE_DEVICES=0,5,6;
python quantize_quark.py \
    --model opt-125m \
    --model_path ${model_path} \
    --seqlen 1024 \
    --eval \
    --example brecq \
    --dataset_path ${dataset_path}