GitHub - woojinsoh/fastapi_llm_infer_astreaming: Asynchronous streaming inference for LLM(OpenAI, NVIDIA NIM, NAVER HyperClova) using FastAPI.

FastAPI LLM Inference Async Stream Implentation

The contents in this repository is for demonstrating the LLM inference with asynchronous stream using the APIs from OPENAI, NVIDIA NIM, or NAVER HYPERCLOVA on FastAPI.

Prerequisite

Prepare for the keys for the LLM platforms(NGC, OPENAI, Clova Studio)
- NGC
- OPENAI
- ClovaStudio
Assign the keys within key_config.env file.
```
$ source key_config.env
```
- Then check if those keys are propely assigned as environment variables by executing env command.
```
$ env 
SHELL=/bin/bash
NGC_CLI_API_KEY=xxxxx
...
NVIDIA_API_KEY=nvapi-....
...
   
```
Deploy NVIDA NIM on your server for the self-Hosted API.
- Local deployment of NIM service requires the NVAIE license.
- For example, how to deploy NIM for mistralai/mistral-7b-instruct-v0.3 on your host is described in NGC.
Install the python dependencies.
```
$ pip3 install -r requirements.txt
```

Quick start

Launch the FastAPI server.
```
python3 launch.py
```

Inference.

The inference works as an asynchronous stream.

python3 client.py -p "nim" -q "who is the president of Korea?"

Sample output is

[PLATFORM]: NIM
[QUERY]: who is the president of Korea?
[STATUS CODE]: 200
[STREAMING RESPONSES]
{"status": "processing", "data": " Moon"}
{"status": "processing", "data": " J"}
{"status": "processing", "data": "ae"}
{"status": "processing", "data": "-"}
{"status": "processing", "data": "in"}
{"status": "processing", "data": " is"}
{"status": "processing", "data": " the"}
{"status": "processing", "data": " current"}
{"status": "processing", "data": " president"}
{"status": "processing", "data": " of"}
{"status": "processing", "data": " South"}
{"status": "processing", "data": " Korea"}
{"status": "processing", "data": "."}
{"status": "complete", "data": "Stream finished"}

[FULL RESPONSE]
Moon Jae-in is the current president of South Korea.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
client.py		client.py
key_config.env		key_config.env
launch.py		launch.py
requirements.txt		requirements.txt
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastAPI LLM Inference Async Stream Implentation

Prerequisite

Quick start

About

Releases

Packages

Languages

woojinsoh/fastapi_llm_infer_astreaming

Folders and files

Latest commit

History

Repository files navigation

FastAPI LLM Inference Async Stream Implentation

Prerequisite

Quick start

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages