Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to start tgi-gaudi server with meta-llama/Llama-3.2-11B-Vision-Instruct #270

Open
2 of 4 tasks
dmsuehir opened this issue Feb 6, 2025 · 2 comments
Open
2 of 4 tasks

Comments

@dmsuehir
Copy link

dmsuehir commented Feb 6, 2025

System Info

Docker image: ghcr.io/huggingface/tgi-gaudi:2.3.1
OS: Ubuntu 22.04
HL-SMI Version: hl-1.18.0-fw-53.1.1.1
Driver Version: 1.18.0-ee698fb

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

I'm trying to run tgi-gaudi with meta-llama/Llama-3.2-11B-Vision-Instruct. I'm starting the tgi-gaudi server using the 2.3.1 docker image like:

docker run -d \
    --name tgi-gaudi-llama-vision \
    -p 8399:80 \
    --env http_proxy=$http_proxy \
    --env https_proxy=$https_proxy \
    --env HF_HUB_DISABLE_PROGRESS_BARS=1 \
    --env HF_HUB_ENABLE_HF_TRANSFER=0 \
    --env HABANA_VISIBLE_DEVICES=0 \
    --env OMPI_MCA_btl_vader_single_copy_mechanism=none \
    --env PREFILL_BATCH_BUCKET_SIZE=1 \
    --env BATCH_BUCKET_SIZE=1 \
    --env MAX_BATCH_TOTAL_TOKENS=4096 \
    --env ENABLE_HPU_GRAPH=true \
    --env LIMIT_HPU_GRAPH=true \
    --env USE_FLASH_ATTENTION=true \
    --env FLASH_ATTENTION_RECOMPUTE=true \
    --env HF_TOKEN=${HF_TOKEN} \
    --runtime=habana \
    --cap-add=sys_nice \
    --ipc=host \
    ghcr.io/huggingface/tgi-gaudi:2.3.1 \
    --model-id meta-llama/Llama-3.2-11B-Vision-Instruct \
    --max-input-tokens 3048 \
    --max-total-tokens 4096

I'm seeing the following error in the logs:

Traceback (most recent call last):
  File "/usr/local/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 170, in serve
    server.serve(
  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 259, in serve
    asyncio.run(
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 213, in serve_inner
    model = get_model_with_lora_adapters(
  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/__init__.py", line 227, in get_model_with_lora_adapters
    model = get_model(
  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/__init__.py", line 200, in get_model
    return CausalLM(
  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 695, in __init__
    raise ValueError(f"Model type {model.config.model_type} is not supported!")
ValueError: Model type mllama_text_model is not supported!

Expected behavior

I would expect the server to start successfully, since I see the meta-llama/Llama-3.2-11B-Vision-Instruct model linked on the supported models page as "Mllama".

@yuanwu2017
Copy link
Collaborator

It will be enabled in next release.

@dmsuehir
Copy link
Author

It will be enabled in next release.

Good to hear it, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants