Streaming Responses from QnABot (7.0.0+)

The streaming responses feature enhances the responses from QnABot by returning real-time stream from Large Language Models (LLMs) to appear in the chat interface. Instead of waiting for the complete response to be generated, the chat users can see the answer being constructed in real-time, providing a more interactive and engaging experience. Currently, this feature leverages Amazon Bedrock ConverseStream and RetrieveAndGenerateStream APIs to establish a real-time connection between the LLM and the QnABot chat interface, ensuring efficient delivery of response as they're generated.

Key Features

Real-time streaming of LLM responses through Amazon Bedrock
Progressive text generation visible in chat interface
Seamless integration with custom Lambda hooks
Optional deployment of streaming resources through nested stack with EnableStreaming flag

Benefits

Reduced perceived latency for RAG flows
More natural conversation flow
Quasi-immediate visibility of response generation
Enhanced user engagement

How It Works

When a user submits a question, the chat client establishes connection to QnABot using websocket endpoint that QnABot creates.
QnABot connects to the configured LLM through Amazon Bedrock
As the LLM generates the response, each text chunk is immediately streamed to the chat client.
The users see the response being built incrementally, similar to human typing. The streaming continues until the complete response is delivered

Admin Setup

The QnABot admin needs to enable streaming option in the cloudformation template using parameter EnableStreaming.
When using an external chat client such as Lex Web UI, the admin will need to setup in Lex Web UI the StreamingWebSocketEndpoint output from QnABot stack.

WebSocket Connection Flow

User visits the chat client with streaming enabled
The chat client establishes a WebSocket connection
QnABot establishes connection with Websocket
A bi-directional communication channel is created between the chat client and QnABot

Message Flow

User sends a question
Backend (LLMs) begins generating the response
Each text segment is immediately streamed to the client as it's generated
The streaming continues until the LLM completes the response
Fulfillment lambda returns the final complete response
The streamed content is replaced with the final formatted response

Sample Streaming Output

In below example, QnABot on AWS answers a question by streaming response from Bedrock Knowledge Base.

Technical Details

Uses API Gateway V2 for WebSocket connection to supports bi-directional real-time communication.
- Uses encrypted WebSocket protocol specification wss:// (WebSocket Secure)
- Secure access to WebSocket API controlled with IAM authorization and signed requests
- Default API Gateway V2 quotas apply for configuring and running a WebSocket API.
- Configures ping route to support one-way pings. To prevent Idle Connection Timeout of 10 min while user session is active, the chat client will send ping every 9 mins until 2 hours limit is reached.
- Implements logging for API Gateway V2 and Streaming Lambda, which are accessible in Amazon CloudWatch.
Uses ConverseStream API for streaming from Bedrock LLM
Uses RetrieveAndGenerateStream API for streaming from Bedrock Knowledge Base

Setup:

Step A: Enable Streaming QnABot on AWS Stack

To turn on streaming support for QnABot:

Set the EnableStreaming cloudformation parameter to TRUE and deploy the solution. This will create a nested which will deploy the following resources:
- Amazon API Gateway V2
- Amazon DynamoDB Table
- AWS Lambda
Once stack update is complete, go to Stack > Outputs and copy the value for StreamingWebSocketEndpoint output.

Step B: Enable Streaming in Lex Web UI (0.26+) and provide WebSocket Endpoint from QnABot

To turn on streaming support for Lex Web UI:

Set the AllowStreamingResponses cloudformation parameter to true and deploy the solution.
Copy the StreamingWebSocketEndpoint value from the QnABot stack Outputs and enter it as the StreamingWebSocketEndpoint parameter when deploying the AWS Lex Web UI chat client CloudFormation template, as shown in the screenshot below.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Streaming Responses from QnABot (7.0.0+)

Key Features

Benefits

How It Works

Admin Setup

WebSocket Connection Flow

Message Flow

Sample Streaming Output

Technical Details

Setup:

Step A: Enable Streaming QnABot on AWS Stack

Step B: Enable Streaming in Lex Web UI (0.26+) and provide WebSocket Endpoint from QnABot

Files

README.md

Latest commit

History

README.md

File metadata and controls

Streaming Responses from QnABot (7.0.0+)

Key Features

Benefits

How It Works

Admin Setup

WebSocket Connection Flow

Message Flow

Sample Streaming Output

Technical Details

Setup:

Step A: Enable Streaming QnABot on AWS Stack

Step B: Enable Streaming in Lex Web UI (0.26+) and provide WebSocket Endpoint from QnABot