-
Notifications
You must be signed in to change notification settings - Fork 373
Closed
Labels
Description
Describe the bug
The ROUGE class does not initialize the scorer attribute in its constructor. However, the compute method attempts to use it directly, which causes an error under all circumstances.
lighteval/src/lighteval/metrics/metrics_sample.py
Lines 442 to 501 in a1c610d
| def __init__( | |
| self, | |
| methods: str | list[str], | |
| multiple_golds: bool = False, | |
| bootstrap: bool = False, | |
| normalize_gold: callable = None, | |
| normalize_pred: callable = None, | |
| aggregation_function: callable = None, | |
| tokenizer: object = None, | |
| ): | |
| """A ROUGE wrapper method. Relies on `rouge_scorer`. | |
| Args: | |
| methods (str | list[str]): What type of ROUGE scoring to use. Can be one or any of `rouge1`, `rouge2`, `rougeL` or `rougeLsum`. | |
| multiple_golds (bool, optional): Whether to compute ROUGE by allowing the comparison to several golds | |
| at once, or to compute ROUGE on individual gold/prediction pairs and aggregate afterwards. Defaults to False. | |
| bootstrap (bool, optional): Whether to use bootstrapping. Defaults to False. | |
| aggregation_function (callable, optional): How to aggregate the item results. Defaults to max. | |
| Used if there are several golds or predictions on which scores were computed. | |
| normalize_gold (callable, optional): Function to use to normalize the reference strings. | |
| Defaults to None if no normalization is applied. | |
| normalize_pred (callable, optional): Function to use to normalize the predicted strings. | |
| Defaults to None if no normalization is applied. | |
| tokenizer (object, optional): An object with `tokenize` method to be used by rouge scorer. If None, rouge-scorer's | |
| default tokenizer will be used. | |
| """ | |
| if aggregation_function and bootstrap: | |
| logger.warning("Can't use both bootstrapping and an aggregation function in Rouge. Keeping bootstrap.") | |
| self.aggregation_function = aggregation_function | |
| if self.aggregation_function is None: | |
| self.aggregation_function = np.mean | |
| self.methods = as_list(methods) | |
| if any(method not in self.ALLOWED_ROUGE_METHODS for method in self.methods): | |
| raise ValueError( | |
| f"Rouge was initialised with method {methods}, which is not in {','.join(self.ALLOWED_ROUGE_METHODS)}" | |
| ) | |
| self.multiple_golds = multiple_golds | |
| self.bootstrap = bootstrap | |
| self.normalize_gold = normalize_gold | |
| self.normalize_pred = normalize_pred | |
| self.tokenizer = tokenizer | |
| def compute(self, golds: list[str], predictions: list[str], **kwargs) -> float | dict: | |
| """Computes the metric(s) over a list of golds and predictions for one single sample. | |
| Args: | |
| golds (list[str]): Reference targets | |
| predictions (list[str]): Predicted strings | |
| Returns: | |
| float or dict: Aggregated score over the current sample's items. | |
| If several rouge functions have been selected, returns a dict which maps name and scores. | |
| """ | |
| from rouge_score import rouge_scorer | |
| if self.scorer is None: | |
| self.scorer = rouge_scorer.RougeScorer(self.methods, tokenizer=self.tokenizer) | |
| # Normalize |
Proposed Solution
Add an initialization for self.scorer as None in the __init__ method.
To Reproduce
Run any test involving ROUGE, such as the following:
lighteval accelerate \
"pretrained=gpt2" \
"helm|summarization:xsum-sampled|0|0"Expected behavior
The test should execute without issues.
Version info
The issue was encountered with the version installed directly from the main branch.