FastAPI LLM Inference Async Stream Implentation

The contents in this repository is for demonstrating the LLM inference with asynchronous stream using the APIs from OPENAI, NVIDIA NIM, or NAVER HYPERCLOVA on FastAPI.

Prerequisite

Prepare for the keys for the LLM platforms(NGC, OPENAI, Clova Studio)
- NGC
- OPENAI
- ClovaStudio
Assign the keys within key_config.env file.
```
$ source key_config.env
```
- Then check if those keys are propely assigned as environment variables by executing env command.
```
$ env 
SHELL=/bin/bash
NGC_CLI_API_KEY=xxxxx
...
NVIDIA_API_KEY=nvapi-....
...
   
```
Deploy NVIDA NIM on your server for the self-Hosted API.
- Local deployment of NIM service requires the NVAIE license.
- For example, how to deploy NIM for mistralai/mistral-7b-instruct-v0.3 on your host is described in NGC.
Install the python dependencies.
```
$ pip3 install -r requirements.txt
```

Quick start

Launch the FastAPI server.
```
python3 launch.py
```

Inference.

The inference works as an asynchronous stream.

python3 client.py -p "nim" -q "who is the president of Korea?"

Sample output is

[PLATFORM]: NIM
[QUERY]: who is the president of Korea?
[STATUS CODE]: 200
[STREAMING RESPONSES]
{"status": "processing", "data": " Moon"}
{"status": "processing", "data": " J"}
{"status": "processing", "data": "ae"}
{"status": "processing", "data": "-"}
{"status": "processing", "data": "in"}
{"status": "processing", "data": " is"}
{"status": "processing", "data": " the"}
{"status": "processing", "data": " current"}
{"status": "processing", "data": " president"}
{"status": "processing", "data": " of"}
{"status": "processing", "data": " South"}
{"status": "processing", "data": " Korea"}
{"status": "processing", "data": "."}
{"status": "complete", "data": "Stream finished"}

[FULL RESPONSE]
Moon Jae-in is the current president of South Korea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FastAPI LLM Inference Async Stream Implentation

Prerequisite

Quick start

Files

README.md

Latest commit

History

README.md

File metadata and controls

FastAPI LLM Inference Async Stream Implentation

Prerequisite

Quick start