-
Notifications
You must be signed in to change notification settings - Fork 373
Closed
Labels
Description
Describe the bug
When I run the GSM8K eval, I get 0 accuracy. If I save details, I see that the predictions look like follows:
['Please reason step by step, and put your final answer within \\boxed{}.ichtig\n']
I'm not sure if this is related to the closed issue #202 , but I've seen others having this issue as well
To Reproduce
export VLLM_WORKER_MULTIPROC_METHOD=spawn
model="Qwen/Qwen2.5-Math-1.5B"
seed=0
MODEL_ARGS="pretrained=$model,dtype=bfloat16,max_model_length=4096,gpu_memory_utilization=0.9,generation_parameters={max_new_tokens:4096,temperature:0.6,top_p:0.95,seed:$seed}"
lighteval vllm "$MODEL_ARGS" "leaderboard|gsm8k|0|0" --use-chat-template --output-dir "data" --save-detailsExpected behavior
I do not have this issue with other evals. e.g. gpqa:diamond runs fine. It generates outputs and nonzero accuracies
Version info
lighteval is version '0.8.1', and vllm is version '0.8.4'
Thank you for your help!