Handle num_items_in_batch in Mistral's forward #34576

gheinrich · 2024-11-02T08:36:13Z

What does this PR do?

This PR enables handling loss keyword arguments in the Mistral model's forward() method.
Specifically, if num_items_in_batch is passed, the value is used to properly normalize the loss value.

This relates to the Gradient Accumulation fix (#34191)

Fixes #34575

cc @ArthurZucker as it relates to text models.

This PR enables handling loss keyword arguments in the Mistral forward() method. Specifically, if `num_items_in_batch` is passed, the value is used to properly normalize the loss value. This relates to the Gradient Accumulation fix (huggingface#34191) Fixes huggingface#34575

Rocketknight1 · 2024-11-04T13:53:09Z

cc @muellerzr for the GA fix as well!

ArthurZucker

Hey! This is not how we fixed it for other models 😉
see

transformers/src/transformers/models/llama/modeling_llama.py

Lines 1181 to 1183 in 820be50

    
           loss = None 
        
           if labels is not None: 
        
               loss = self.loss_function(logits=logits, labels=labels, vocab_size=self.config.vocab_size, **kwargs)

gheinrich · 2024-11-25T10:49:00Z

Hello, other models have the loss function defined in a parent class. Mistral models have it defined in the forward method. If I don't want to change this behavior, how do you suggest I proceed?

ArthurZucker · 2024-11-25T17:37:03Z

You need to pretty much copy-paste the code in Llama / other models 🤗 The parent class is the same for Mistral as well

muellerzr · 2025-01-24T14:14:31Z

Will be superseded/fulfilled with #35875

gheinrich force-pushed the dev/mistral-num-items-in-batch branch from adf418a to a4faa09 Compare November 2, 2024 08:38

Rocketknight1 added the bug label Nov 4, 2024

ArthurZucker reviewed Nov 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle num_items_in_batch in Mistral's forward #34576

Handle num_items_in_batch in Mistral's forward #34576

gheinrich commented Nov 2, 2024

Rocketknight1 commented Nov 4, 2024

ArthurZucker left a comment

gheinrich commented Nov 25, 2024

ArthurZucker commented Nov 25, 2024 •

edited

Loading

muellerzr commented Jan 24, 2025

	loss = None
	if labels is not None:
	loss = self.loss_function(logits=logits, labels=labels, vocab_size=self.config.vocab_size, **kwargs)

Handle num_items_in_batch in Mistral's forward #34576

Are you sure you want to change the base?

Handle num_items_in_batch in Mistral's forward #34576

Conversation

gheinrich commented Nov 2, 2024

What does this PR do?

Rocketknight1 commented Nov 4, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

gheinrich commented Nov 25, 2024

ArthurZucker commented Nov 25, 2024 • edited Loading

muellerzr commented Jan 24, 2025

ArthurZucker commented Nov 25, 2024 •

edited

Loading