Perplexity Evaluations#
Below details how to run perplexity evaluations. Perplexity evaluations utilize the wikitext2 dataset. Supported devices currently are CPU and GPU.
Summary of support:
Model Types |
Quark Quantized |
Pretrained |
Perplexity |
ROUGE |
METEOR |
|
---|---|---|---|---|---|---|
LLMs |
||||||
|
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
|
✓ |
✓ |
X |
✓ |
✓ |
✓ |
VLMs |
||||||
|
✓ |
✓ |
✓ |
✓ |
✓ |
✓ |
|
X |
X |
X |
X |
X |
X |
Recipes#
The
--ppl
argument specifies the perplexity task.
PPL on Torch Models#
PPL on a pretrained LLM. Example with
Llama2-7b-hf
:
python llm_eval.py \
--model_args pretrained=meta-llama/Llama-2-7b-hf \
--ppl \
--trust_remote_code \
--batch_size 1 \
--device cuda
Alternatively, to load a local checkpoint:
python llm_eval.py \
--model_args pretrained=[local checkpoint path] \
--ppl \
--trust_remote_code \
--batch_size 1 \
--device cuda
PPL on a Quark Quantized model. Example with
Llama-2-7b-chat-hf-awq-uint4-asym-g128-bf16-lmhead
python llm_eval.py \
--model_args pretrained=meta-llama/Llama-2-7b-hf \
--model_reload \
--import_file_format hf_format \
--import_model_dir [path to Llama-2-7b-chat-hf-awq-uint4-asym-g128-bf16-lmhead model] \
--ppl \
--trust_remote_code \
--batch_size 1 \
--device cuda
Other Arguments#
Set
--multi_gpu
for multi-gpu supportSet
--save_metrics_to_csv
andmetrics_output_dir
to save PPL score to CSVSet
dtype
bymodel_args dtype=float32
to change model dtype.