Fix PaliGemma Pad Token Masking During Training #35855 #35859

sambhavnoobcoder · 2025-01-23T19:34:21Z

Problem Statement

In PaliGemma model's _update_causal_mask function, padding tokens were being incorrectly unmasked during training mode. This occurred because the order of operations first applied padding masks and then unmasked prefix tokens (including pad tokens) during training, leading to inconsistent behavior especially with left padding.

Fixes : #35855

Approach

After analyzing the masking logic, we identified that the issue stemmed from the sequence of operations in mask application. The solution was to reorder the masking operations to:

First unmask prefix tokens during training mode
Then apply padding masks

This ensures pad tokens remain masked regardless of their position or training state.

Implementation

The fix involved reordering the masking operations in the _update_causal_mask function while maintaining the same mathematical operations. This approach ensures:

No changes to the underlying logic
Minimal code changes
Preserved backward compatibility
Consistent behavior across training and inference modes

Test Coverage

The new test validates:

Pad tokens remain masked (value = dtype.min) during training
Non-pad tokens are properly masked/unmasked based on their position
Behavior is consistent across different batch sizes and sequence lengths

Screenshots

cc: @amyeroberts @molbap @zucchini-nlp kindly review this whenever you find the time .

zucchini-nlp

Thanks for opening a PR! Can you move the test to the test_modeling_paligemma and adapt it for dummy weights. I don't think we need slow test with big model, since we're just checking mask on attn weights

zucchini-nlp · 2025-01-24T08:33:15Z

tests/models/paligemma/test_attention_mask.py

@@ -0,0 +1,62 @@
+import unittest


We need to add a test in test_modeling_paligemma.py file with tiny dummy model weights, and a single test covering attention with/without suffix should be enough

okay . I have done this in 96e1b43 commit , and removed the separate testing file from the PR . kindly review it for any more changes as well . if everything is fine , I've pushed the fixed style changes and we are good to merge as well.

zucchini-nlp

Thanks for iterating on this!

LGTM, but for the test we better just add a new one under PaliGemmaForConditionalGenerationModelTest which already creates dummy model and inputs. Sorry if it wasn't clear the first time
Also, let's test the model forward in general with output_attentions=True, instead of relying on a single update_causal_mask call. And add test if token type ids are passed vs when not passed, to make sure attention is masked correctly in all cases

sambhavnoobcoder added 6 commits January 24, 2025 00:46

change order of unmasking of tokens

6a963f3

library import

a4a5df1

class setup

0122d98

test function

a7024fe

refactor

a644110

add commit message

9544344

sambhavnoobcoder mentioned this pull request Jan 23, 2025

Paliegemma Pad Token not Masked #35855

Open

4 tasks

zucchini-nlp reviewed Jan 24, 2025

View reviewed changes

sambhavnoobcoder added 5 commits January 24, 2025 23:04

test modified

2798e3b

explict initiliasation of weights + made model smaller

96e1b43

removed sepete testing file

d045175

fixup

828a3fe

fixup core

543b717

zucchini-nlp reviewed Jan 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PaliGemma Pad Token Masking During Training #35855 #35859

Fix PaliGemma Pad Token Masking During Training #35855 #35859

sambhavnoobcoder commented Jan 23, 2025

zucchini-nlp left a comment

zucchini-nlp Jan 24, 2025

sambhavnoobcoder Jan 24, 2025 •

edited

Loading

zucchini-nlp left a comment

Fix PaliGemma Pad Token Masking During Training #35855 #35859

Are you sure you want to change the base?

Fix PaliGemma Pad Token Masking During Training #35855 #35859

Conversation

sambhavnoobcoder commented Jan 23, 2025

Problem Statement

Approach

Implementation

Test Coverage

Screenshots

zucchini-nlp left a comment

Choose a reason for hiding this comment

zucchini-nlp Jan 24, 2025

Choose a reason for hiding this comment

sambhavnoobcoder Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

zucchini-nlp left a comment

Choose a reason for hiding this comment

sambhavnoobcoder Jan 24, 2025 •

edited

Loading