Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In regard to the semi-supervised attention issue #2

Open
WeitaiKang opened this issue Sep 10, 2023 · 2 comments
Open

In regard to the semi-supervised attention issue #2

WeitaiKang opened this issue Sep 10, 2023 · 2 comments

Comments

@WeitaiKang
Copy link

Thank you for your work, it's very interesting, but I have a question.

Is it okay to perform semi-supervised attention when your teacher's data augmentation and your student's data augmentation are not the same, and the attention map has not undergone geometric transformation?

@Disguiser15
Copy link
Owner

Thank you for your interest and the good question! The teacher network is weakly augmented to provide stable pseudo labels, and the student network is strongly augmented to learn additional valuable information and prevent overfitting. If both employ the same data augmentation, the loss of attention constraints is small and the student network gradient is barely updated. You can see the ablation study in Table. 4. The attention map without geometric transformation can also provide valuable pseudo information.

@WeitaiKang
Copy link
Author

WeitaiKang commented Sep 11, 2023

It's great to see such a prompt response from you!

However, it doesn't seem quite reasonable to use L2 loss to constrain attention when teacher-student models are employing different augmentations. After all, strong augmentations like RandomResize and RandomSizeCrop can change the object's position, So in the attention map, it seems that there is no one-to-one correspondence between the tokens from the teacher and the student anymore. Without additional geometric transformations, why would we use L2 to constrain these two types of maps?

Is there any additional processing applied to the attention map of the data generated by the teacher's weak augmentations in this context?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants