Fix XGLM loss computation (PyTorch and TensorFlow) #35878

damianoamatruda · 2025-01-24T13:28:32Z

What does this PR do?

This PR fixes the loss computation for XGLM in both PyTorch and TensorFlow implementations.

The labels were shifted by one and the padding token was appended, causing artificial loss contributions, inconsistencies between non-padded and right-padded sequences, and potential bias toward predicting padding tokens.

The updated implementations ignore the last logit and do not append the padding token to the labels, aligning with the behavior in GPT-2 and other models.

The logic of the computation was first identified in #22540, where it was ported from the PyTorch implementation to the TensorFlow one for consistency. In this PR I've reverted the TensorFlow implementation to its previous valid behavior and I've updated the PyTorch implementation to match it.

I've also added XGLM tests to ensure that the losses of non-padded and padded inputs match.

Who can review?

@Rocketknight1 @gante @ArthurZucker

Rocketknight1 · 2025-01-24T17:05:37Z

Hi @damianoamatruda, yes, the original code is incorrect! However, a simpler fix would be to change the label padding value to -100, which indicates masked positions for our loss computations. Padding/shifting labels is preferable to shifting logits, because logits are a much larger tensor and can carry gradient, so speed + memory usage are affected if we perform lots of operations on them, especially during training.

damianoamatruda · 2025-01-24T18:06:43Z

Hi @Rocketknight1, thank you for the clear explanation!

I've updated the PR to shift only the labels, as previously done, and replaced the padding token with the mask value -100.

I've also updated the PyTorch test to match the changes introduced in the newly merged PR #35659.

Rocketknight1

Yes, LGTM now! cc @ArthurZucker for core maintainer review

HuggingFaceDocBuilderDev · 2025-01-24T19:04:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1 · 2025-01-24T19:07:33Z

run-slow: xglm

github-actions · 2025-01-24T19:08:45Z

This comment contains run-slow, running the specified jobs: ['models/xglm'] ...

Rocketknight1 · 2025-01-24T19:45:18Z

Hi @damianoamatruda I'm seeing some failures in the slow tests for XGLM, can you take a look? You can check the CI logs, or to run slow tests locally, you can do something like RUN_SLOW=1 pytest tests/models/xglm/test_modeling_xglm.py

This updates the expected output string of test_xglm_sample for torch 2.0 to the correct one and removes the one for torch 1.13.1 + cu116 (transformers moved to torch 2.0 with PR huggingface#35358).

This resolves an issue of input embeddings in `TFXGLMModelTest.test_save_load_after_resize_token_embeddings` where resizing token embeddings caused the following error: ``` AttributeError: Exception encountered when calling layer 'model' (type TFXGLMMainLayer). 'ResourceVariable' object has no attribute 'vocab_size' ```

This resolves an issue of input embeddings in `TFSpeech2TextModel.test_save_load_after_resize_token_embeddings` where resizing token embeddings caused the following error: ``` AttributeError: Exception encountered when calling layer 'decoder' (type TFSpeech2TextDecoder). 'ResourceVariable' object has no attribute 'vocab_size' ```

This resolves an issue of output embeddings in `TFT5ModelTest.test_model_common_attributes` when `model.config.tie_word_embeddings` is set to `False`: ``` AttributeError: 'Dense' object has no attribute 'kernel' ```

This resolves an issue of output embeddings in `TFXGLMModelTest.test_save_load_after_resize_token_embeddings` where resizing token embeddings caused the following error: ``` ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor. ```

This resolves an issue of output embeddings in `TFGPTJModelTest.test_save_load_after_resize_token_embeddings` where resizing token embeddings caused the following error: ``` ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor. ```

This resolves an issue of output embeddings in `TFSpeech2TextModelTest.test_save_load_after_resize_token_embeddings` where resizing token embeddings caused the following error: ``` ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor. ```

This resolves: - An issue of input embeddings in `TFT5ModelTest.test_resize_token_embeddings` and `TFT5ModelTest.test_save_load_after_resize_token_embeddings` when `model.config.tie_word_embeddings` is set to `False`: ``` ValueError: Attempt to convert a value (<tf_keras.src.layers.core.embedding.Embedding object at 0x32b0747a0>) with an unsupported type (<class 'tf_keras.src.layers.core.embedding.Embedding'>) to a Tensor. ``` - An issue of input embeddings in `TFMistralModelTest.test_resize_token_embeddings` where resizing token embeddings caused the following error: ``` ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor. ```

damianoamatruda force-pushed the fix-xglm-loss branch from 8df8015 to 180e037 Compare January 24, 2025 18:04

Rocketknight1 approved these changes Jan 24, 2025

View reviewed changes

Rocketknight1 added the run-slow label Jan 24, 2025

Rocketknight1 removed the run-slow label Jan 24, 2025

damianoamatruda force-pushed the fix-xglm-loss branch from 180e037 to e07585a Compare January 24, 2025 23:45

damianoamatruda added 4 commits January 25, 2025 00:51

Fix XGLM loss computation (PyTorch and TensorFlow)

2cc5e50

Update expected output string in XGLM sample test

472cb0e

This updates the expected output string of test_xglm_sample for torch 2.0 to the correct one and removes the one for torch 1.13.1 + cu116 (transformers moved to torch 2.0 with PR huggingface#35358).

Add expected output string for CPU in XGLM sample test

a21e199

Update expected output IDs in XGLM generation test

f82fc05

damianoamatruda force-pushed the fix-xglm-loss branch from e07585a to 71523b9 Compare January 24, 2025 23:51

damianoamatruda added 7 commits January 25, 2025 16:28

Add support for keras.layers.Dense in _get_word_embedding_weight

9f43b3c

This resolves an issue of output embeddings in `TFT5ModelTest.test_model_common_attributes` when `model.config.tie_word_embeddings` is set to `False`: ``` AttributeError: 'Dense' object has no attribute 'kernel' ```

damianoamatruda force-pushed the fix-xglm-loss branch from 71523b9 to 76b2e4e Compare January 25, 2025 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix XGLM loss computation (PyTorch and TensorFlow) #35878

Fix XGLM loss computation (PyTorch and TensorFlow) #35878

damianoamatruda commented Jan 24, 2025

Rocketknight1 commented Jan 24, 2025

damianoamatruda commented Jan 24, 2025

Rocketknight1 left a comment

HuggingFaceDocBuilderDev commented Jan 24, 2025

Rocketknight1 commented Jan 24, 2025

github-actions bot commented Jan 24, 2025

Rocketknight1 commented Jan 24, 2025

Fix XGLM loss computation (PyTorch and TensorFlow) #35878

Are you sure you want to change the base?

Fix XGLM loss computation (PyTorch and TensorFlow) #35878

Conversation

damianoamatruda commented Jan 24, 2025

What does this PR do?

Who can review?

Rocketknight1 commented Jan 24, 2025

damianoamatruda commented Jan 24, 2025

Rocketknight1 left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jan 24, 2025

Rocketknight1 commented Jan 24, 2025

github-actions bot commented Jan 24, 2025

Rocketknight1 commented Jan 24, 2025