TypeError: CustomTrainer.compute_loss() got an unexpected keyword argument 'num_items_in_batch' #36331

ruidazeng · 2025-02-21T13:58:12Z

System Info

transformers version: 4.50.0.dev0
Platform: Linux-5.15.0-210.163.7.el8uek.x86_64-x86_64-with-glibc2.35
Python version: 3.10.16
Huggingface_hub version: 0.29.1
Safetensors version: 0.5.2
Accelerate version: 1.4.0
Accelerate config: not found
DeepSpeed version: 0.16.3
PyTorch version (GPU?): 2.6.0+cu124 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: NO
Using GPU in script?: YES
GPU type: NVIDIA A100-SXM4-80GB

Who can help?

@muellerzr @SunMarc

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

My code:

trainer = CustomTrainer(
        model=model,
        train_dataset=torch_format_dataset,
        eval_dataset=torch_format_dataset,
        args=training_args,
        data_collator=custom_data_collator,
    )
    model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
    trainer.train()

Error Info:

[2025-02-20 19:14:49,033] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
num_devices: 1
max_steps: 1250
/opt/saturncloud/envs/tofu/lib/python3.10/site-packages/transformers/training_args.py:1609: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
  warnings.warn(
[2025-02-20 19:14:50,775] [INFO] [comm.py:652:init_distributed] cdb=None
[2025-02-20 19:14:50,775] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2025-02-20 19:14:50,903] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 1
[2025-02-20 19:14:51,687] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 341, num_elems = 1.42B
[2025-02-20 19:15:23,644] [WARNING] [engine.py:1244:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
Parameter Offload: Total persistent parameters: 544768 in 194 params
  0%|                                                                                                                                                                                                                                                                | 0/1250 [00:00<?, ?it/s]Error executing job with overrides: ['split=full', 'batch_size=4', 'gradient_accumulation_steps=4', 'model_family=phi', 'lr=2e-5']
Traceback (most recent call last):
  File "/home/jovyan/mu-benchmark/finetune.py", line 125, in main
    trainer.train()
  File "/opt/saturncloud/envs/tofu/lib/python3.10/site-packages/transformers/trainer.py", line 2243, in train
    return inner_training_loop(
  File "/opt/saturncloud/envs/tofu/lib/python3.10/site-packages/transformers/trainer.py", line 2554, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
  File "/opt/saturncloud/envs/tofu/lib/python3.10/site-packages/transformers/trainer.py", line 3704, in training_step
    loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
TypeError: CustomTrainer.compute_loss() got an unexpected keyword argument 'num_items_in_batch'

Data Set: https://huggingface.co/datasets/locuslab/TOFU

Expected behavior

Why would there be a compute_loss() error? I never gave a num_items_in_batch argument.

The text was updated successfully, but these errors were encountered:

SunMarc · 2025-02-21T17:08:53Z

how is your compute loss defined ? we changed a couple of things with compute loss recently in Trainer and now it requires to have this new arg which is indeed a bit breaking. the issue you have is that it is being called when training to model here:
loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch) cc @muellerzr

ruidazeng · 2025-02-21T17:53:51Z

how is your compute loss defined ? we changed a couple of things with compute loss recently in Trainer and now it requires to have this new arg which is indeed a bit breaking. the issue you have is that it is being called when training to model here: loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch) cc @muellerzr

def compute_loss(self, model, inputs, return_outputs=False):
        input_ids, labels, attention_mask = inputs
        # forward pass
        outputs = model(input_ids,labels=labels, attention_mask=attention_mask)
        # logits = outputs.get("logits")
        loss = outputs.loss
        # # compute custom loss (suppose one has 3 labels with different weights)
        # loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 2.0, 3.0], device=model.device))
        # loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
        return (loss, outputs) if return_outputs else loss

bialatoheeb · 2025-02-25T19:00:15Z

@muellerz @SunMarc
I also encountered this issue with the latest version of transformer=4.49.0. Adding the num_items_in_batch=None
as an argument in my custom loss function fixed it.

ruidazeng · 2025-02-26T14:21:37Z

@muellerz @SunMarc I also encountered this issue with the latest version of transformer=4.49.0. Adding the num_items_in_batch=None as an argument in my custom loss function fixed it.

how is your compute loss defined?

bialatoheeb · 2025-02-26T15:10:10Z

@muellerz @SunMarc I also encountered this issue with the latest version of transformer=4.49.0. Adding the num_items_in_batch=None as an argument in my custom loss function fixed it.

how is your compute loss defined?

 def compute_loss(
        self, model, inputs, num_items_in_batch=None, return_outputs=False
    ):
        labels = inputs.pop("labels")
        outputs = model(**inputs)
        logits = outputs[0]
        ......

SunMarc · 2025-02-26T15:52:08Z

Could you try the PR above @bialatoheeb @ruidazeng without your changes ?

SunMarc · 2025-02-26T15:53:45Z

This should work only if you specified compute_loss_func. It won't work if you overwrite compute_loss method

ruidazeng added the bug label Feb 21, 2025

SunMarc linked a pull request Feb 26, 2025 that will close this issue

make num_items_in_batch optional in compute_loss_func #36426

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: CustomTrainer.compute_loss() got an unexpected keyword argument 'num_items_in_batch' #36331

TypeError: CustomTrainer.compute_loss() got an unexpected keyword argument 'num_items_in_batch' #36331

ruidazeng commented Feb 21, 2025

SunMarc commented Feb 21, 2025 •

edited

Loading

ruidazeng commented Feb 21, 2025

bialatoheeb commented Feb 25, 2025 •

edited

Loading

ruidazeng commented Feb 26, 2025

bialatoheeb commented Feb 26, 2025

SunMarc commented Feb 26, 2025

SunMarc commented Feb 26, 2025

TypeError: CustomTrainer.compute_loss() got an unexpected keyword argument 'num_items_in_batch' #36331

TypeError: CustomTrainer.compute_loss() got an unexpected keyword argument 'num_items_in_batch' #36331

Comments

ruidazeng commented Feb 21, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

SunMarc commented Feb 21, 2025 • edited Loading

ruidazeng commented Feb 21, 2025

bialatoheeb commented Feb 25, 2025 • edited Loading

ruidazeng commented Feb 26, 2025

bialatoheeb commented Feb 26, 2025

SunMarc commented Feb 26, 2025

SunMarc commented Feb 26, 2025

SunMarc commented Feb 21, 2025 •

edited

Loading

bialatoheeb commented Feb 25, 2025 •

edited

Loading