[BUG] Issue with the scorer Attribute Initialization in ROUGE · Issue #470 · huggingface/lighteval

Describe the bug

The ROUGE class does not initialize the scorer attribute in its constructor. However, the compute method attempts to use it directly, which causes an error under all circumstances.

lighteval/src/lighteval/metrics/metrics_sample.py

Lines 442 to 501 in a1c610d

    
               def __init__( 
        
                   self, 
        
                   methods: str | list[str], 
        
                   multiple_golds: bool = False, 
        
                   bootstrap: bool = False, 
        
                   normalize_gold: callable = None, 
        
                   normalize_pred: callable = None, 
        
                   aggregation_function: callable = None, 
        
                   tokenizer: object = None, 
        
               ): 
        
                   """A ROUGE wrapper method. Relies on `rouge_scorer`. 
        
                   Args: 
        
                       methods (str | list[str]): What type of ROUGE scoring to use. Can be one or any of `rouge1`, `rouge2`, `rougeL` or `rougeLsum`. 
        
                       multiple_golds (bool, optional): Whether to compute ROUGE by allowing the comparison to several golds 
        
                           at once, or to compute ROUGE on individual gold/prediction pairs and aggregate afterwards. Defaults to False. 
        
                       bootstrap (bool, optional): Whether to use bootstrapping. Defaults to False. 
        
                       aggregation_function (callable, optional): How to aggregate the item results. Defaults to max. 
        
                           Used if there are several golds or predictions on which scores were computed. 
        
                       normalize_gold (callable, optional): Function to use to normalize the reference strings. 
        
                           Defaults to None if no normalization is applied. 
        
                       normalize_pred (callable, optional): Function to use to normalize the predicted strings. 
        
                           Defaults to None if no normalization is applied. 
        
                       tokenizer (object, optional): An object with `tokenize` method to be used by rouge scorer. If None, rouge-scorer's 
        
                           default tokenizer will be used. 
        
                   """ 
        
                   if aggregation_function and bootstrap: 
        
                       logger.warning("Can't use both bootstrapping and an aggregation function in Rouge. Keeping bootstrap.") 
        
                   self.aggregation_function = aggregation_function 
        
                   if self.aggregation_function is None: 
        
                       self.aggregation_function = np.mean 
        
                   self.methods = as_list(methods) 
        
                   if any(method not in self.ALLOWED_ROUGE_METHODS for method in self.methods): 
        
                       raise ValueError( 
        
                           f"Rouge was initialised with method {methods}, which is not in {','.join(self.ALLOWED_ROUGE_METHODS)}" 
        
                       ) 
        
                   self.multiple_golds = multiple_golds 
        
                   self.bootstrap = bootstrap 
        
                   self.normalize_gold = normalize_gold 
        
                   self.normalize_pred = normalize_pred 
        
                   self.tokenizer = tokenizer 
        
               def compute(self, golds: list[str], predictions: list[str], **kwargs) -> float | dict: 
        
                   """Computes the metric(s) over a list of golds and predictions for one single sample. 
        
                   Args: 
        
                       golds (list[str]): Reference targets 
        
                       predictions (list[str]): Predicted strings 
        
                   Returns: 
        
                       float or dict: Aggregated score over the current sample's items. 
        
                           If several rouge functions have been selected, returns a dict which maps name and scores. 
        
                   """ 
        
                   from rouge_score import rouge_scorer 
        
                   if self.scorer is None: 
        
                       self.scorer = rouge_scorer.RougeScorer(self.methods, tokenizer=self.tokenizer) 
        
                   # Normalize

Proposed Solution

Add an initialization for self.scorer as None in the __init__ method.

To Reproduce

Run any test involving ROUGE, such as the following:

lighteval accelerate \
    "pretrained=gpt2" \
    "helm|summarization:xsum-sampled|0|0"

Expected behavior

The test should execute without issues.

Version info

The issue was encountered with the version installed directly from the main branch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Issue with the scorer Attribute Initialization in ROUGE #470

Describe the bug

Proposed Solution

To Reproduce

Expected behavior

Version info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	def __init__(
	self,
	methods: str \| list[str],
	multiple_golds: bool = False,
	bootstrap: bool = False,
	normalize_gold: callable = None,
	normalize_pred: callable = None,
	aggregation_function: callable = None,
	tokenizer: object = None,
	):
	"""A ROUGE wrapper method. Relies on `rouge_scorer`.

	Args:
	methods (str \| list[str]): What type of ROUGE scoring to use. Can be one or any of `rouge1`, `rouge2`, `rougeL` or `rougeLsum`.
	multiple_golds (bool, optional): Whether to compute ROUGE by allowing the comparison to several golds
	at once, or to compute ROUGE on individual gold/prediction pairs and aggregate afterwards. Defaults to False.
	bootstrap (bool, optional): Whether to use bootstrapping. Defaults to False.
	aggregation_function (callable, optional): How to aggregate the item results. Defaults to max.
	Used if there are several golds or predictions on which scores were computed.
	normalize_gold (callable, optional): Function to use to normalize the reference strings.
	Defaults to None if no normalization is applied.
	normalize_pred (callable, optional): Function to use to normalize the predicted strings.
	Defaults to None if no normalization is applied.
	tokenizer (object, optional): An object with `tokenize` method to be used by rouge scorer. If None, rouge-scorer's
	default tokenizer will be used.
	"""
	if aggregation_function and bootstrap:
	logger.warning("Can't use both bootstrapping and an aggregation function in Rouge. Keeping bootstrap.")
	self.aggregation_function = aggregation_function
	if self.aggregation_function is None:
	self.aggregation_function = np.mean

	self.methods = as_list(methods)
	if any(method not in self.ALLOWED_ROUGE_METHODS for method in self.methods):
	raise ValueError(
	f"Rouge was initialised with method {methods}, which is not in {','.join(self.ALLOWED_ROUGE_METHODS)}"
	)
	self.multiple_golds = multiple_golds
	self.bootstrap = bootstrap
	self.normalize_gold = normalize_gold
	self.normalize_pred = normalize_pred
	self.tokenizer = tokenizer

	def compute(self, golds: list[str], predictions: list[str], **kwargs) -> float \| dict:
	"""Computes the metric(s) over a list of golds and predictions for one single sample.

	Args:
	golds (list[str]): Reference targets
	predictions (list[str]): Predicted strings

	Returns:
	float or dict: Aggregated score over the current sample's items.
	If several rouge functions have been selected, returns a dict which maps name and scores.
	"""
	from rouge_score import rouge_scorer

	if self.scorer is None:
	self.scorer = rouge_scorer.RougeScorer(self.methods, tokenizer=self.tokenizer)

	# Normalize

[BUG] Issue with the scorer Attribute Initialization in ROUGE #470

Description

Describe the bug

Proposed Solution

To Reproduce

Expected behavior

Version info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions