Skip to content

[BUG] Issue with the scorer Attribute Initialization in ROUGE #470

@ryan-minato

Description

@ryan-minato

Describe the bug

The ROUGE class does not initialize the scorer attribute in its constructor. However, the compute method attempts to use it directly, which causes an error under all circumstances.

def __init__(
self,
methods: str | list[str],
multiple_golds: bool = False,
bootstrap: bool = False,
normalize_gold: callable = None,
normalize_pred: callable = None,
aggregation_function: callable = None,
tokenizer: object = None,
):
"""A ROUGE wrapper method. Relies on `rouge_scorer`.
Args:
methods (str | list[str]): What type of ROUGE scoring to use. Can be one or any of `rouge1`, `rouge2`, `rougeL` or `rougeLsum`.
multiple_golds (bool, optional): Whether to compute ROUGE by allowing the comparison to several golds
at once, or to compute ROUGE on individual gold/prediction pairs and aggregate afterwards. Defaults to False.
bootstrap (bool, optional): Whether to use bootstrapping. Defaults to False.
aggregation_function (callable, optional): How to aggregate the item results. Defaults to max.
Used if there are several golds or predictions on which scores were computed.
normalize_gold (callable, optional): Function to use to normalize the reference strings.
Defaults to None if no normalization is applied.
normalize_pred (callable, optional): Function to use to normalize the predicted strings.
Defaults to None if no normalization is applied.
tokenizer (object, optional): An object with `tokenize` method to be used by rouge scorer. If None, rouge-scorer's
default tokenizer will be used.
"""
if aggregation_function and bootstrap:
logger.warning("Can't use both bootstrapping and an aggregation function in Rouge. Keeping bootstrap.")
self.aggregation_function = aggregation_function
if self.aggregation_function is None:
self.aggregation_function = np.mean
self.methods = as_list(methods)
if any(method not in self.ALLOWED_ROUGE_METHODS for method in self.methods):
raise ValueError(
f"Rouge was initialised with method {methods}, which is not in {','.join(self.ALLOWED_ROUGE_METHODS)}"
)
self.multiple_golds = multiple_golds
self.bootstrap = bootstrap
self.normalize_gold = normalize_gold
self.normalize_pred = normalize_pred
self.tokenizer = tokenizer
def compute(self, golds: list[str], predictions: list[str], **kwargs) -> float | dict:
"""Computes the metric(s) over a list of golds and predictions for one single sample.
Args:
golds (list[str]): Reference targets
predictions (list[str]): Predicted strings
Returns:
float or dict: Aggregated score over the current sample's items.
If several rouge functions have been selected, returns a dict which maps name and scores.
"""
from rouge_score import rouge_scorer
if self.scorer is None:
self.scorer = rouge_scorer.RougeScorer(self.methods, tokenizer=self.tokenizer)
# Normalize

Proposed Solution

Add an initialization for self.scorer as None in the __init__ method.

To Reproduce

Run any test involving ROUGE, such as the following:

lighteval accelerate \
    "pretrained=gpt2" \
    "helm|summarization:xsum-sampled|0|0"

Expected behavior

The test should execute without issues.

Version info

The issue was encountered with the version installed directly from the main branch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions