We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
应该改为:
原因:因为你是对π_θ_old求的期望,你在求KL散度的时候肯定不是π_θ在分子上。另外ref模型应该是GRPO里才有的概念(用来约束当前模型和该iter开始时刻模型的更新幅度),在PPO里应该只有old模型,所以应该是π_θ_old在分子上。
可以参考InstructGPT:
The text was updated successfully, but these errors were encountered:
No branches or pull requests
应该改为:
原因:因为你是对π_θ_old求的期望,你在求KL散度的时候肯定不是π_θ在分子上。另外ref模型应该是GRPO里才有的概念(用来约束当前模型和该iter开始时刻模型的更新幅度),在PPO里应该只有old模型,所以应该是π_θ_old在分子上。
可以参考InstructGPT:

The text was updated successfully, but these errors were encountered: