<image> cannot be at the end. #43

ZhangGongjie · 2025-02-14T07:59:49Z

cannot be at the end. However, in Qwen2_5_VLProcessor, the image token is append at the end. Will this cause sub-optimal performance?

2U1 · 2025-02-14T08:28:46Z

I don't think it matters where th <image> token is. Does an error occur when you put your image token in the end?
Also, what I remeber is that when I tokenized the input, the tokens (<|image_pad|>) were located at the point where I've located the image token.

ZhangGongjie · 2025-02-14T08:41:12Z

I think I figured out what's going on here.

When <image> is put at the end, it is \n<image> instead of <image>\n. Therefore, in such cases, the llava image token (<image>) won't be replaced by Qwen image tokens.

It could be fixed by replacing both "\n<image>" and "<image>\n" with Qwen image tokens.

2U1 · 2025-02-14T08:45:19Z

Oh I haven't thought about putting the image at the end. Most of the case image was added at the first or at the middle.

ZhangGongjie · 2025-02-14T08:47:24Z

Yeah, it depends on whether your prompt or your image contains more information. I am doing OCR so I put the image at the end.

ZhangGongjie · 2025-02-14T08:49:55Z

It seems in transformer's qwen2.5vl, it also put image token at the end.

(Pdb) prompt

'<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n\nPlease parse the given exam paper image in yaml format.<|vision_start|><|image_pad|><|vision_end|><|im_end|>\n<|im_start|>assistant\n'

2U1 · 2025-02-14T08:53:16Z

Actually the location of the image token is decided by the user.

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

If you are doing something like this, then the image token would be before the text.
The example was from the officail repo of Qwen2.5-VL

ZhangGongjie · 2025-02-14T09:03:52Z

indeed! I didn't realize that. Thanks for carefully looking into my question.

Actually the location of the image token is decided by the user.
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
If you are doing something like this, then the image token would be before the text. The example was from the officail repo of Qwen2.5-VL

2U1 · 2025-02-18T01:38:06Z

I've updated the code to handle the <image> token to be at the end.

2U1 closed this as completed Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<image> cannot be at the end. #43

<image> cannot be at the end. #43

ZhangGongjie commented Feb 14, 2025

2U1 commented Feb 14, 2025

ZhangGongjie commented Feb 14, 2025 •

edited

Loading

2U1 commented Feb 14, 2025

ZhangGongjie commented Feb 14, 2025

ZhangGongjie commented Feb 14, 2025

2U1 commented Feb 14, 2025

ZhangGongjie commented Feb 14, 2025

2U1 commented Feb 18, 2025

<image> cannot be at the end. #43

<image> cannot be at the end. #43

Comments

ZhangGongjie commented Feb 14, 2025

2U1 commented Feb 14, 2025

ZhangGongjie commented Feb 14, 2025 • edited Loading

2U1 commented Feb 14, 2025

ZhangGongjie commented Feb 14, 2025

ZhangGongjie commented Feb 14, 2025

2U1 commented Feb 14, 2025

ZhangGongjie commented Feb 14, 2025

2U1 commented Feb 18, 2025

ZhangGongjie commented Feb 14, 2025 •

edited

Loading