Skip to content

Commit 6af5280

Browse files
JoelNiklausclefourrier
authored andcommitted
Fixes a TypeError in Sacrebleu. (#387)
--------- Co-authored-by: Clémentine Fourrier <[email protected]>
1 parent dcdf53a commit 6af5280

File tree

6 files changed

+16
-9
lines changed

6 files changed

+16
-9
lines changed

.github/ISSUE_TEMPLATE/evaluation-task-request.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,6 @@ assignees: ''
1313

1414
## Evaluation metadata
1515
Provide all available
16-
- Paper url:
17-
- Github url:
16+
- Paper url:
17+
- Github url:
1818
- Dataset url:

.github/ISSUE_TEMPLATE/feature-request.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,3 @@ A clear and concise description of what you want to happen.
1515

1616
## Posssible alternatives
1717
A clear and concise description of any alternative solutions or features you've considered.
18-

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ Harness and HELM teams for their pioneering work on LLM evaluations.
104104
Got ideas? Found a bug? Want to add a
105105
[task](https://github.com/huggingface/lighteval/wiki/Adding-a-Custom-Task) or
106106
[metric](https://github.com/huggingface/lighteval/wiki/Adding-a-New-Metric)?
107-
Contributions are warmly welcomed!
107+
Contributions are warmly welcomed!
108108

109109
If you're adding a new feature, please open an issue first.
110110

examples/model_configs/peft_model.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
model:
2-
type: "base"
2+
type: "base"
33
base_params:
44
model_args: "pretrained=predibase/customer_support,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ... For a PEFT model, the pretrained model should be the one trained with PEFT and the base model below will contain the original model on which the adapters will be applied.
5-
dtype: "4bit" # Specifying the model to be loaded in 4 bit uses BitsAndBytesConfig. The other option is to use "8bit" quantization.
5+
dtype: "4bit" # Specifying the model to be loaded in 4 bit uses BitsAndBytesConfig. The other option is to use "8bit" quantization.
66
compile: true
77
merged_weights: # Ignore this section if you are not using PEFT models
88
delta_weights: false # set to True of your model should be merged with a base model, also need to provide the base model name

examples/model_configs/quantized_model.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
model:
2-
type: "base"
2+
type: "base"
33
base_params:
44
model_args: "pretrained=HuggingFaceH4/zephyr-7b-beta,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ...
5-
dtype: "4bit" # Specifying the model to be loaded in 4 bit uses BitsAndBytesConfig. The other option is to use "8bit" quantization.
5+
dtype: "4bit" # Specifying the model to be loaded in 4 bit uses BitsAndBytesConfig. The other option is to use "8bit" quantization.
66
compile: true
77
merged_weights: # Ignore this section if you are not using PEFT models
88
delta_weights: false # set to True of your model should be merged with a base model, also need to provide the base model name

src/lighteval/metrics/metrics_corpus.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
import sacrebleu
3131
import sklearn.metrics
3232

33+
from lighteval.logging.hierarchical_logger import hlog_warn
3334
from lighteval.metrics.sample_preparator import (
3435
GenerativeCorpusMetricInput,
3536
LogprobCorpusMetricInput,
@@ -103,7 +104,14 @@ def __init__(self, metric_type: str):
103104
def compute(self, items: list[GenerativeCorpusMetricInput]) -> float:
104105
"""Computes the metric score over all the corpus generated items, by using the sacrebleu implementation."""
105106
golds = [i.golds for i in items]
106-
preds = [as_list(i.preds) for i in items]
107+
preds = []
108+
for i in items:
109+
pred = as_list(i.preds)
110+
if len(pred) > 1:
111+
hlog_warn(
112+
f"Multiple predictions present, keeping only the first prediction (when computing sacrebleu.{self.metric.__name__})."
113+
)
114+
preds.append(pred[0])
107115
return float(self.metric(hypotheses=preds, references=golds).score)
108116

109117

0 commit comments

Comments
 (0)