[BUG] GSM8k Zero Accuracy

## Describe the bug

When I run the GSM8K eval, I get 0 accuracy.  If I save details, I see that the `predictions` look like follows:

```
['Please reason step by step, and put your final answer within \\boxed{}.ichtig\n']
```

I'm not sure if this is related to the closed issue #202 , but I've seen others having this issue as well

## To Reproduce

```bash

export VLLM_WORKER_MULTIPROC_METHOD=spawn
model="Qwen/Qwen2.5-Math-1.5B"
seed=0
MODEL_ARGS="pretrained=$model,dtype=bfloat16,max_model_length=4096,gpu_memory_utilization=0.9,generation_parameters={max_new_tokens:4096,temperature:0.6,top_p:0.95,seed:$seed}"

lighteval vllm "$MODEL_ARGS" "leaderboard|gsm8k|0|0" --use-chat-template --output-dir "data" --save-details
```

## Expected behavior

I do not have this issue with other evals.  e.g. gpqa:diamond runs fine. It generates outputs and nonzero accuracies

## Version info

lighteval is version `'0.8.1'`, and vllm is version `'0.8.4'`


Thank you for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] GSM8k Zero Accuracy #686

Describe the bug

To Reproduce

Expected behavior

Version info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] GSM8k Zero Accuracy #686

Description

Describe the bug

To Reproduce

Expected behavior

Version info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions