Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model: Support Janus-pro #3203

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

mickqian
Copy link
Contributor

@mickqian mickqian commented Jan 29, 2025

Motivation

Support deepseek-ai/Janus-Pro models, #3195

Modifications

  1. Janus model files
  2. related tests

Checklist

Sorry, something went wrong.

@zhyncs
Copy link
Member

zhyncs commented Jan 29, 2025

Coooooool!

@mickqian mickqian force-pushed the janus-pro branch 8 times, most recently from b6542eb to a29f545 Compare January 31, 2025 13:10
@codeaxn
Copy link

codeaxn commented Jan 31, 2025

Forgive my noobishness, but how to test the PR?
I run:

python -m sglang.launch_server --model-path deepseek-ai/Janus-Pro-7B --trust-remote-code --chat-template
 janus --port 30000 --host 0.0.0.0 

And it aborts with the error below:

  File "/home/user/tools/sglang/python/sglang/srt/models/registry.py", line 65, in resolve_model_cls
    return self._raise_for_unsupported(architectures)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/tools/sglang/python/sglang/srt/models/registry.py", line 32, in _raise_for_unsupported
    raise ValueError(ValueError: Model architectures ['MultiModalityCausalLM'] are not supported for now. Supported architectures: dict_keys(['BaichuanForCausalLM', 'ChatGLMModel', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'DbrxForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'DeepseekV3ForCausalLM', 'ExaoneForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'Gemma2ForSequenceClassification', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GraniteForCausalLM', 'Grok1ForCausalLM', 'Grok1ModelForCausalLM', 'InternLM2ForCausalLM', 'InternLM2ForRewardModel', 'LlamaForCausalLM', 'Phi3ForCausalLM', 'InternLM3ForCausalLM', 'LlamaForClassification', 'LlamaForCausalLMEagle', 'LlamaEmbeddingModel', 'MistralModel', 'LlamaForSequenceClassification', 'LlamaForSequenceClassificationWithNormal_Weights', 'LlavaLlamaForCausalLM', 'LlavaQwenForCausalLM', 'LlavaMistralForCausalLM', 'LlavaVidForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MiniCPMV', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MllamaForConditionalGeneration', 'OlmoForCausalLM', 'Olmo2ForCausalLM', 'OlmoeForCausalLM', 'Phi3SmallForCausalLM', 'QWenLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2ForCausalLMEagle', 'Qwen2MoeForCausalLM', 'Qwen2VLForConditionalGeneration', 'StableLmForCausalLM', 'TorchNativeLlamaForCausalLM', 'TorchNativePhi3ForCausalLM', 'XverseForCausalLM', 'XverseMoeForCausalLM', 'YiVLForCausalLM'])

[2025-01-31 13:12:23] Received sigquit from a child proces. It usually means the child failed.
Killed

@mickqian
Copy link
Contributor Author

mickqian commented Feb 1, 2025

Forgive my noobishness, but how to test the PR? I run:

python -m sglang.launch_server --model-path deepseek-ai/Janus-Pro-7B --trust-remote-code --chat-template
 janus --port 30000 --host 0.0.0.0 

And it aborts with the error below:

  File "/home/user/tools/sglang/python/sglang/srt/models/registry.py", line 65, in resolve_model_cls
    return self._raise_for_unsupported(architectures)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/tools/sglang/python/sglang/srt/models/registry.py", line 32, in _raise_for_unsupported
    raise ValueError(ValueError: Model architectures ['MultiModalityCausalLM'] are not supported for now. Supported architectures: dict_keys(['BaichuanForCausalLM', 'ChatGLMModel', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'DbrxForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'DeepseekV3ForCausalLM', 'ExaoneForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'Gemma2ForSequenceClassification', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GraniteForCausalLM', 'Grok1ForCausalLM', 'Grok1ModelForCausalLM', 'InternLM2ForCausalLM', 'InternLM2ForRewardModel', 'LlamaForCausalLM', 'Phi3ForCausalLM', 'InternLM3ForCausalLM', 'LlamaForClassification', 'LlamaForCausalLMEagle', 'LlamaEmbeddingModel', 'MistralModel', 'LlamaForSequenceClassification', 'LlamaForSequenceClassificationWithNormal_Weights', 'LlavaLlamaForCausalLM', 'LlavaQwenForCausalLM', 'LlavaMistralForCausalLM', 'LlavaVidForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MiniCPMV', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MllamaForConditionalGeneration', 'OlmoForCausalLM', 'Olmo2ForCausalLM', 'OlmoeForCausalLM', 'Phi3SmallForCausalLM', 'QWenLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2ForCausalLMEagle', 'Qwen2MoeForCausalLM', 'Qwen2VLForConditionalGeneration', 'StableLmForCausalLM', 'TorchNativeLlamaForCausalLM', 'TorchNativePhi3ForCausalLM', 'XverseForCausalLM', 'XverseMoeForCausalLM', 'YiVLForCausalLM'])

[2025-01-31 13:12:23] Received sigquit from a child proces. It usually means the child failed.
Killed

Could you make sure the local repo has been updated, and python has chosen the correct sglang version? BTW, this PR will be unstable until merged.

@codeaxn
Copy link

codeaxn commented Feb 1, 2025

Forgive my noobishness, but how to test the PR? I run:

python -m sglang.launch_server --model-path deepseek-ai/Janus-Pro-7B --trust-remote-code --chat-template
 janus --port 30000 --host 0.0.0.0 

And it aborts with the error below:

  File "/home/user/tools/sglang/python/sglang/srt/models/registry.py", line 65, in resolve_model_cls
    return self._raise_for_unsupported(architectures)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/tools/sglang/python/sglang/srt/models/registry.py", line 32, in _raise_for_unsupported
    raise ValueError(ValueError: Model architectures ['MultiModalityCausalLM'] are not supported for now. Supported architectures: dict_keys(['BaichuanForCausalLM', 'ChatGLMModel', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'DbrxForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'DeepseekV3ForCausalLM', 'ExaoneForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'Gemma2ForSequenceClassification', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GraniteForCausalLM', 'Grok1ForCausalLM', 'Grok1ModelForCausalLM', 'InternLM2ForCausalLM', 'InternLM2ForRewardModel', 'LlamaForCausalLM', 'Phi3ForCausalLM', 'InternLM3ForCausalLM', 'LlamaForClassification', 'LlamaForCausalLMEagle', 'LlamaEmbeddingModel', 'MistralModel', 'LlamaForSequenceClassification', 'LlamaForSequenceClassificationWithNormal_Weights', 'LlavaLlamaForCausalLM', 'LlavaQwenForCausalLM', 'LlavaMistralForCausalLM', 'LlavaVidForCausalLM', 'MiniCPMForCausalLM', 'MiniCPM3ForCausalLM', 'MiniCPMV', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MllamaForConditionalGeneration', 'OlmoForCausalLM', 'Olmo2ForCausalLM', 'OlmoeForCausalLM', 'Phi3SmallForCausalLM', 'QWenLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2ForCausalLMEagle', 'Qwen2MoeForCausalLM', 'Qwen2VLForConditionalGeneration', 'StableLmForCausalLM', 'TorchNativeLlamaForCausalLM', 'TorchNativePhi3ForCausalLM', 'XverseForCausalLM', 'XverseMoeForCausalLM', 'YiVLForCausalLM'])

[2025-01-31 13:12:23] Received sigquit from a child proces. It usually means the child failed.
Killed

Could you make sure the local repo has been updated, and python has chosen the correct sglang version? BTW, this PR will be unstable until merged.

The issue was caused by a missing dependency. Managed to get it working after running "pip install addict".

@codeaxn
Copy link

codeaxn commented Feb 1, 2025

Any plans to add text to image?

Copy link
Collaborator

@yizhang2077 yizhang2077 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late review, I take a basic review through it and leave some comments here, I'll take a closer look these days.

@mickqian mickqian force-pushed the janus-pro branch 2 times, most recently from 8446258 to a8b0764 Compare February 16, 2025 15:37
@mickqian mickqian changed the title feat: Support Janus-pro model: Support Janus-pro Feb 16, 2025
@mickqian mickqian force-pushed the janus-pro branch 2 times, most recently from 0ee809f to 85d8d81 Compare February 16, 2025 16:01
@BaiStone2017
Copy link

janus-pro支持多模态理解和多模态生成,大佬,当前这套代码是仅支持多模态理解吧?

@mickqian mickqian force-pushed the janus-pro branch 5 times, most recently from 9e8aecd to a1226ed Compare March 2, 2025 05:09
Copy link
Collaborator

@yizhang2077 yizhang2077 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave some questions and comments, review is continued...

@@ -134,7 +152,7 @@ def calculate_max_num_frames() -> int:
ret = (max_req_input_len - len(input_ids)) // self.NUM_TOKEN_PER_FRAME
return min(ret, 100)

MAX_NUM_FRAMES = calculate_max_num_frames()
MAX_NUM_FRAMES = 30
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we change max_num_frames here

Copy link
Contributor Author

@mickqian mickqian Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because I find it hard to estimate the final number of tokens expanded from an image, it's implementation-specific. As a result, a hard limit is set here. Maybe I should slightly loose it a bit?

Copy link
Contributor Author

@mickqian mickqian Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible way to address it, is to introduce something like max_sampled_frames into SamplingParams. But I haven't seen anything like this in existing popular frameworks.

Copy link
Collaborator

@yizhang2077 yizhang2077 Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is hard to estimate, I think we can remove calculate_max_num_frames(), and we need tune MAX_NUM_FRAMES to a value which don't cause OOM in most case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but how much memory should we assume we have?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can set the number to server_args or env variable (default 30 is ok), once oom is raised, we can leave an entrance for user to change.

Copy link
Contributor Author

@mickqian mickqian Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me, will do it in a separate pr

@yizhang2077
Copy link
Collaborator

Could we paste mmmu benchmark result here?

@yizhang2077
Copy link
Collaborator

janus-pro支持多模态理解和多模态生成,大佬,当前这套代码是仅支持多模态理解吧?

hi @BaiStone2017, I think this commit only support multimodal understanding

@merrymercy merrymercy requested a review from HaiShaw as a code owner March 3, 2025 08:12
@mickqian
Copy link
Contributor Author

mickqian commented Mar 3, 2025

mmmu result:
(janus doesn't provide a transformer implementation, I managed to test it locally)

sglang hf
Janus-Pro-1B 0.337 0.345
Janus-Pro-7B 0.356 0.346

@yizhang2077
Copy link
Collaborator

yizhang2077 commented Mar 3, 2025

It seems to change too many files. There are somes refractor code and some of them will change the structure under the folder of managers, do you think this change is ok? @merrymercy

  1. add python/sglang/srt/managers/multi_modality_padding.py for common method of image padding
  2. split image processor to several files and put it into image_processors folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants