warmup error when MAX_TOTAL_TOKENS and max_input_length are not power of 2 numbers #256

rbrugaro · 2024-12-17T06:06:56Z

System Info

Hi,
For the below model and configuration gets ERROR during warmup. Also tested smaller BATCH_BUCKET_SIZE but also get error.

This same model works fine when the MAX_TOTAL_TOKENS and max_input_length are selected from power of 2 numbers like the ones used in the repo README.

Model = https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct

services:
  tgi-gaudi-service:
    image: ghcr.io/huggingface/tgi-gaudi:2.0.6
    container_name: tgi-gaudi-server
    ports:
      - "6005:80"
    volumes:
      - "PATH_TO_YOUR_LOCAL_MODEL_CACHE/hub:/data"
    environment:
      no_proxy: ${no_proxy}
      NO_PROXY: ${no_proxy}
      http_proxy: ${http_proxy}
      https_proxy: ${https_proxy}
      HF_TOKEN: ${HF_TOKEN}
      HF_HUB_DISABLE_PROGRESS_BARS: 1
      HF_HUB_ENABLE_HF_TRANSFER: 0
      HABANA_VISIBLE_DEVICES: all
      OMPI_MCA_btl_vader_single_copy_mechanism: none
      ENABLE_HPU_GRAPH: true
      LIMIT_HPU_GRAPH: true
      USE_FLASH_ATTENTION: true
      FLASH_ATTENTION_RECOMPUTE: true
      PT_HPU_ENABLE_LAZY_COLLECTIVES: true
      TEXT_GENERATION_SERVER_IGNORE_EOS_TOKEN: false
      MAX_TOTAL_TOKENS: 10000
      BATCH_BUCKET_SIZE: 32
      PREFILL_BATCH_BUCKET_SIZE: 2
      PAD_SEQUENCE_TO_MULTIPLE_OF: 64
    runtime: habana
    cap_add:
      - SYS_NICE
    ipc: host
    command: >
      --model-id ${LLM_MODEL_ID} --sharded true --num-shard 8
      --max-input-length 8488 --max-total-tokens 10000
      --max-batch-prefill-tokens 16976 --max-batch-total-tokens 320000
      --max-waiting-tokens 7 --waiting-served-ratio 1.2
      --max-concurrent-requests 512
networks:
  default:
    driver: bridge

cc: @yu

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

docker compose -f compose.yaml up -d
It fails during warm up at initialization

Expected behavior

Correct launch of the service after warm up completion

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

warmup error when MAX_TOTAL_TOKENS and max_input_length are not power of 2 numbers #256

warmup error when MAX_TOTAL_TOKENS and max_input_length are not power of 2 numbers #256

rbrugaro commented Dec 17, 2024

warmup error when MAX_TOTAL_TOKENS and max_input_length are not power of 2 numbers #256

warmup error when MAX_TOTAL_TOKENS and max_input_length are not power of 2 numbers #256

Comments

rbrugaro commented Dec 17, 2024

System Info

Information

Tasks

Reproduction

Expected behavior