[Feature] GRPO to fine tune InternVL2.5 #943

paulpacaud · 2025-03-05T08:35:03Z

Motivation

Would it be possible to add a GRPO fine tuning stage to InternVL (2.5) ?
I believe it would be great to teach InternVL how to reason without specifying the rationales in a SFT-way but letting it discover it through RL.

Related resources

No response

Additional context

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] GRPO to fine tune InternVL2.5 #943

[Feature] GRPO to fine tune InternVL2.5 #943

paulpacaud commented Mar 5, 2025

[Feature] GRPO to fine tune InternVL2.5 #943

[Feature] GRPO to fine tune InternVL2.5 #943

Comments

paulpacaud commented Mar 5, 2025

Motivation

Related resources

Additional context