Skip to content

[BUG] GSM8k Zero Accuracy #686

@candemircan

Description

@candemircan

Describe the bug

When I run the GSM8K eval, I get 0 accuracy. If I save details, I see that the predictions look like follows:

['Please reason step by step, and put your final answer within \\boxed{}.ichtig\n']

I'm not sure if this is related to the closed issue #202 , but I've seen others having this issue as well

To Reproduce

export VLLM_WORKER_MULTIPROC_METHOD=spawn
model="Qwen/Qwen2.5-Math-1.5B"
seed=0
MODEL_ARGS="pretrained=$model,dtype=bfloat16,max_model_length=4096,gpu_memory_utilization=0.9,generation_parameters={max_new_tokens:4096,temperature:0.6,top_p:0.95,seed:$seed}"

lighteval vllm "$MODEL_ARGS" "leaderboard|gsm8k|0|0" --use-chat-template --output-dir "data" --save-details

Expected behavior

I do not have this issue with other evals. e.g. gpqa:diamond runs fine. It generates outputs and nonzero accuracies

Version info

lighteval is version '0.8.1', and vllm is version '0.8.4'

Thank you for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions