Skip to content

issues Search Results · repo:deepseek-ai/DeepSeek-Math language:Python

Filter by

35 results
 (61 ms)

35 results

indeepseek-ai/DeepSeek-Math (press backspace or delete to remove)

Image 应该改为: Image 原因:因为你是对π_θ_old求的期望,你在求KL散度的时候肯定不是π_θ在分子上。另外ref模型应该是GRPO里才有的概念(用来约束当前模型和该iter开始时刻模型的更新幅度),在PPO里应该只有old模型,所以应该是π_θ_old在分子上。 可以参考InstructGPT: Image
  • itmorn
  • Opened 
    3 days ago
  • #37

  • KholmogorovEA
  • Opened 
    15 days ago
  • #35

Repo-0
  • jerd64
  • Opened 
    19 days ago
  • #34

Thanks for your impressive work! Will there be an official fine-tuning code or some instructions on further fine-tuning as deepseekcoder, thanks!
  • beichenzbc
  • Opened 
    on Oct 31, 2024
  • #31

I use the docker image from the PISA repository and the prediction file from output.zip of your repository(path/outputs/DeepSeekMath-Base/miniF2F-Isabelle-test/results/cot/predictions.json). But my acc ...
  • wangzhihao-coder
  • 1
  • Opened 
    on Aug 12, 2024
  • #30

The idea of GRPO is impressive. Is there any plan to release the implementation of this method? THX:)
  • Viper403
  • 5
  • Opened 
    on Aug 6, 2024
  • #29

Hello, there is something wrong with flash-attn, can I drop it when I finetune DeepSeek-Math? Will it destroy the performance of the model? Thank you.
  • AceCHQ
  • Opened 
    on Aug 1, 2024
  • #28
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Restrict your search to the title by using the in:title qualifier.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Restrict your search to the title by using the in:title qualifier.
Issue search results · GitHub