-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add drop path schedule #1835
Add drop path schedule #1835
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
It seems like 4 is the correct default value
* EfficientNet/MobileNetV3/HRNetFeatures cls and FX mode support -ve index * MobileNetV3 allows feature_cfg mode to bypass MobileNetV3Features
Questions: summarize me the PR in 5 simple lines Answer: 1. The PR introduces an efficient drop path schedule to accelerate training, inspired by the original implementation found in DINOv2. |
Questions: Write me a poem about the PR Answer: A new class is born, EfficientDropPathBlock, Drop path schedules, linear and uniform, New models are added, with a careful touch, NAdamW optimizer, a new addition, A poem of code, of additions and more, |
@leng-yue Any insight on why this is faster? If the entire batch is dropped (instead of randomly chosen rows), then I guess this doesn't bring any gains? |
Previous implementation didn't in fact drop paths, the tokens were still fed in FFN and attention, while the efficient drop path truly avoided these calculations. |
Update the drop path schedule adheres to the original implementation found in DINOv2.
Add an efficient drop path to accelerate training. #1836
Given 40% drop rate, we can see a 38% performance improvement:
ViT-L/14 eval took 8.701655239999809
ViT-L/14 with efficient drop path eval took 8.702854548999994
ViT-L/14 train took 8.81138907400009
ViT-L/14 with efficient drop path train took 5.4026294970001345
Ref: DinoV2.
Benchmark: COLAB