[Fix]Integrate LLM generation parameters into the evaluation method #36416

Harrisonyong · 2025-02-26T09:50:27Z

What does this PR do?

Phenomenon:

In the Trainer class, the evaluate method does not accept any LLM generation parameters. However, Seq2SeqTrainer inherits from Trainer and overrides the evaluate method. When Seq2SeqTrainer calls Trainer.evaluate, the generation parameters are lost. Subsequently, when Trainer calls the evaluate method again, this results in the loss of the generation parameters.

What I do:

I added gen_kwargs in Trainer.evaluate, which flexibly allows the passing of generate parameters. Additionally, I modified the call to Trainer.evaluate in Seq2SeqTrainer.evaluate to include the LLM generation parameters.

Rocketknight1 · 2025-02-26T13:59:39Z

cc @SunMarc @muellerzr

SunMarc · 2025-02-26T16:04:22Z

Can you share a snippet using Seq2SeqTrainer if the issue that you have ?
Seq2SeqTrainer have it's own evaluate method that store gen_kwargs. We shouldn't have to update Trainer for Seq2SeqTrainer features.

Harry-yong · 2025-02-27T02:42:07Z

Can you share a snippet using Seq2SeqTrainer if the issue that you have ? Seq2SeqTrainer have it's own evaluate method that store gen_kwargs. We shouldn't have to update Trainer for Seq2SeqTrainer features.

In trainer_seq2seq.py:

class Seq2SeqTrainer(Trainer):
     def evaluate(
        self,
        eval_dataset: Optional[Dataset] = None,
        ignore_keys: Optional[List[str]] = None,
        metric_key_prefix: str = "eval",
        **gen_kwargs,
    ) -> Dict[str, float]:
   ...
   return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)

In this snippet, it recalls Trainer.evaluate, but it loses the gen_kwargs. As a result, the parameters will be lost in the next step. And you can see, the next step Trainer.evaluate recall evaluate without any generation params method, it would cover the gen_kwargs as empty, so I add this formal params to link it.

class Trainer:
   def evaluate(
        self,
        eval_dataset: Optional[Union[Dataset, Dict[str, Dataset]]] = None,
        ignore_keys: Optional[List[str]] = None,
        metric_key_prefix: str = "eval",
    ) -> Dict[str, float]:
        ...

        if isinstance(eval_dataset, dict):
            metrics = {}
            for eval_dataset_name, _eval_dataset in eval_dataset.items():
                dataset_metrics = self.evaluate(
                    eval_dataset=_eval_dataset if override else eval_dataset_name,
                    ignore_keys=ignore_keys,
                    metric_key_prefix=f"{metric_key_prefix}_{eval_dataset_name}",
                )
                metrics.update(dataset_metrics)
            return metrics

Therefore, this fix addresses an inherent issue in the Trainer, and adding formal parameters can prevent such issues.
by the way, you mentioned the question in seq2seqTrainer, it should be fix like this:

class Seq2SeqTrainer(Trainer):
     def evaluate(
        self,
        eval_dataset: Optional[Dataset] = None,
        ignore_keys: Optional[List[str]] = None,
        metric_key_prefix: str = "eval",
        **gen_kwargs,
    ) -> Dict[str, float]:
   ...
  return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix, **gen_kwargs)

Harrisonyong and others added 3 commits February 26, 2025 17:29

[Fix]Integrate LLM generation parameters into the evaluation method

b253df9

Merge branch 'main' into master

55368dc

Update trainer_seq2seq.py

683a7f6

Merge branch 'huggingface:main' into master

a994068

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix]Integrate LLM generation parameters into the evaluation method #36416

[Fix]Integrate LLM generation parameters into the evaluation method #36416

Harrisonyong commented Feb 26, 2025

Rocketknight1 commented Feb 26, 2025

SunMarc commented Feb 26, 2025

Harry-yong commented Feb 27, 2025

[Fix]Integrate LLM generation parameters into the evaluation method #36416

Are you sure you want to change the base?

[Fix]Integrate LLM generation parameters into the evaluation method #36416

Conversation

Harrisonyong commented Feb 26, 2025

What does this PR do?

Phenomenon:

What I do:

Rocketknight1 commented Feb 26, 2025

SunMarc commented Feb 26, 2025

Harry-yong commented Feb 27, 2025