-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix XGLM loss computation (PyTorch and TensorFlow) #35878
base: main
Are you sure you want to change the base?
Conversation
Hi @damianoamatruda, yes, the original code is incorrect! However, a simpler fix would be to change the label padding value to |
8df8015
to
180e037
Compare
Hi @Rocketknight1, thank you for the clear explanation! I've updated the PR to shift only the labels, as previously done, and replaced the padding token with the mask value I've also updated the PyTorch test to match the changes introduced in the newly merged PR #35659. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, LGTM now! cc @ArthurZucker for core maintainer review
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
run-slow: xglm |
This comment contains run-slow, running the specified jobs: ['models/xglm'] ... |
Hi @damianoamatruda I'm seeing some failures in the slow tests for XGLM, can you take a look? You can check the CI logs, or to run slow tests locally, you can do something like |
180e037
to
e07585a
Compare
This updates the expected output string of test_xglm_sample for torch 2.0 to the correct one and removes the one for torch 1.13.1 + cu116 (transformers moved to torch 2.0 with PR huggingface#35358).
e07585a
to
71523b9
Compare
This resolves an issue of input embeddings in `TFXGLMModelTest.test_save_load_after_resize_token_embeddings` where resizing token embeddings caused the following error: ``` AttributeError: Exception encountered when calling layer 'model' (type TFXGLMMainLayer). 'ResourceVariable' object has no attribute 'vocab_size' ```
This resolves an issue of input embeddings in `TFSpeech2TextModel.test_save_load_after_resize_token_embeddings` where resizing token embeddings caused the following error: ``` AttributeError: Exception encountered when calling layer 'decoder' (type TFSpeech2TextDecoder). 'ResourceVariable' object has no attribute 'vocab_size' ```
This resolves an issue of output embeddings in `TFT5ModelTest.test_model_common_attributes` when `model.config.tie_word_embeddings` is set to `False`: ``` AttributeError: 'Dense' object has no attribute 'kernel' ```
This resolves an issue of output embeddings in `TFXGLMModelTest.test_save_load_after_resize_token_embeddings` where resizing token embeddings caused the following error: ``` ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor. ```
This resolves an issue of output embeddings in `TFGPTJModelTest.test_save_load_after_resize_token_embeddings` where resizing token embeddings caused the following error: ``` ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor. ```
This resolves an issue of output embeddings in `TFSpeech2TextModelTest.test_save_load_after_resize_token_embeddings` where resizing token embeddings caused the following error: ``` ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor. ```
This resolves: - An issue of input embeddings in `TFT5ModelTest.test_resize_token_embeddings` and `TFT5ModelTest.test_save_load_after_resize_token_embeddings` when `model.config.tie_word_embeddings` is set to `False`: ``` ValueError: Attempt to convert a value (<tf_keras.src.layers.core.embedding.Embedding object at 0x32b0747a0>) with an unsupported type (<class 'tf_keras.src.layers.core.embedding.Embedding'>) to a Tensor. ``` - An issue of input embeddings in `TFMistralModelTest.test_resize_token_embeddings` where resizing token embeddings caused the following error: ``` ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor. ```
71523b9
to
76b2e4e
Compare
What does this PR do?
This PR fixes the loss computation for XGLM in both PyTorch and TensorFlow implementations.
The labels were shifted by one and the padding token was appended, causing artificial loss contributions, inconsistencies between non-padded and right-padded sequences, and potential bias toward predicting padding tokens.
The updated implementations ignore the last logit and do not append the padding token to the labels, aligning with the behavior in GPT-2 and other models.
The logic of the computation was first identified in #22540, where it was ported from the PyTorch implementation to the TensorFlow one for consistency. In this PR I've reverted the TensorFlow implementation to its previous valid behavior and I've updated the PyTorch implementation to match it.
I've also added XGLM tests to ensure that the losses of non-padded and padded inputs match.
Who can review?
@Rocketknight1 @gante @ArthurZucker