You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the InternVL2.5-MPO paper, the author mentioned that cot is not as effective as direct answer for MLLM. I wonder why cot is so bad for MLLM compared to LLM? In addition, my recent experiment used QwenVL2.5 to answer questions and found that cot's reasoning effect is also very good, but it is difficult for it to completely follow instructions such as outputing yes or no, which leads to challenges when using the automated evaluation framework to extract answers. I would like to ask the author whether to have any further explanation for this issue?
The text was updated successfully, but these errors were encountered:
In the InternVL2.5-MPO paper, the author mentioned that cot is not as effective as direct answer for MLLM. I wonder why cot is so bad for MLLM compared to LLM? In addition, my recent experiment used QwenVL2.5 to answer questions and found that cot's reasoning effect is also very good, but it is difficult for it to completely follow instructions such as outputing yes or no, which leads to challenges when using the automated evaluation framework to extract answers. I would like to ask the author whether to have any further explanation for this issue?
The text was updated successfully, but these errors were encountered: