issues Search Results · repo:deepseek-ai/DeepSeek-Math language:Python
Filter by
35 results
(61 ms)35 results
indeepseek-ai/DeepSeek-Math (press backspace or delete to remove)Image 应该改为: Image
原因:因为你是对π_θ_old求的期望,你在求KL散度的时候肯定不是π_θ在分子上。另外ref模型应该是GRPO里才有的概念(用来约束当前模型和该iter开始时刻模型的更新幅度),在PPO里应该只有old模型,所以应该是π_θ_old在分子上。
可以参考InstructGPT: Image
itmorn
- Opened 3 days ago
- #37
KholmogorovEA
- Opened 15 days ago
- #35
enddlesswm
- Opened 24 days ago
- #33
SaeedDahy
- Opened on Jan 14
- #32
Thanks for your impressive work! Will there be an official fine-tuning code or some instructions on further fine-tuning
as deepseekcoder, thanks!
beichenzbc
- Opened on Oct 31, 2024
- #31
I use the docker image from the PISA repository and the prediction file from output.zip of your
repository(path/outputs/DeepSeekMath-Base/miniF2F-Isabelle-test/results/cot/predictions.json). But my acc ...
wangzhihao-coder
- 1
- Opened on Aug 12, 2024
- #30
The idea of GRPO is impressive. Is there any plan to release the implementation of this method? THX:)
Viper403
- 5
- Opened on Aug 6, 2024
- #29
Hello, there is something wrong with flash-attn, can I drop it when I finetune DeepSeek-Math? Will it destroy the
performance of the model? Thank you.
AceCHQ
- Opened on Aug 1, 2024
- #28
![Issue origami icon](/images/modules/search/issues.png)
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.![Issue origami icon](/images/modules/search/issues.png)
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.