Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix]Integrate LLM generation parameters into the evaluation method #36416

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Harrisonyong
Copy link

What does this PR do?

Phenomenon:

In the Trainer class, the evaluate method does not accept any LLM generation parameters. However, Seq2SeqTrainer inherits from Trainer and overrides the evaluate method. When Seq2SeqTrainer calls Trainer.evaluate, the generation parameters are lost. Subsequently, when Trainer calls the evaluate method again, this results in the loss of the generation parameters.

What I do:

I added gen_kwargs in Trainer.evaluate, which flexibly allows the passing of generate parameters. Additionally, I modified the call to Trainer.evaluate in Seq2SeqTrainer.evaluate to include the LLM generation parameters.

@Rocketknight1
Copy link
Member

cc @SunMarc @muellerzr

@SunMarc
Copy link
Member

SunMarc commented Feb 26, 2025

Can you share a snippet using Seq2SeqTrainer if the issue that you have ?
Seq2SeqTrainer have it's own evaluate method that store gen_kwargs. We shouldn't have to update Trainer for Seq2SeqTrainer features.

@Harry-yong
Copy link

Can you share a snippet using Seq2SeqTrainer if the issue that you have ? Seq2SeqTrainer have it's own evaluate method that store gen_kwargs. We shouldn't have to update Trainer for Seq2SeqTrainer features.

In trainer_seq2seq.py:

class Seq2SeqTrainer(Trainer):
     def evaluate(
        self,
        eval_dataset: Optional[Dataset] = None,
        ignore_keys: Optional[List[str]] = None,
        metric_key_prefix: str = "eval",
        **gen_kwargs,
    ) -> Dict[str, float]:
   ...
   return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)

In this snippet, it recalls Trainer.evaluate, but it loses the gen_kwargs. As a result, the parameters will be lost in the next step. And you can see, the next step Trainer.evaluate recall evaluate without any generation params method, it would cover the gen_kwargs as empty, so I add this formal params to link it.

class Trainer:
   def evaluate(
        self,
        eval_dataset: Optional[Union[Dataset, Dict[str, Dataset]]] = None,
        ignore_keys: Optional[List[str]] = None,
        metric_key_prefix: str = "eval",
    ) -> Dict[str, float]:
        ...

        if isinstance(eval_dataset, dict):
            metrics = {}
            for eval_dataset_name, _eval_dataset in eval_dataset.items():
                dataset_metrics = self.evaluate(
                    eval_dataset=_eval_dataset if override else eval_dataset_name,
                    ignore_keys=ignore_keys,
                    metric_key_prefix=f"{metric_key_prefix}_{eval_dataset_name}",
                )
                metrics.update(dataset_metrics)
            return metrics

Therefore, this fix addresses an inherent issue in the Trainer, and adding formal parameters can prevent such issues.
by the way, you mentioned the question in seq2seqTrainer, it should be fix like this:

class Seq2SeqTrainer(Trainer):
     def evaluate(
        self,
        eval_dataset: Optional[Dataset] = None,
        ignore_keys: Optional[List[str]] = None,
        metric_key_prefix: str = "eval",
        **gen_kwargs,
    ) -> Dict[str, float]:
   ...
  return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix, **gen_kwargs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants