Skip to content

kris70lesgo/Gate.Ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice-Powered AI Chatbot

Video Demo:

Description:

This project is a Voice-Powered AI Chatbot that allows users to interact with an AI assistant through voice input. The chatbot records the user's speech, transcribes it into text, sends it to a large language model (LLM) via the Groq API, and then speaks out the response using Text-to-Speech (TTS).


Features

  • Real-time Audio Recording: Captures user's voice input using PyAudio (or an alternative like SoundDevice if necessary).
  • Speech-to-Text (ASR): Uses a pre-trained Wav2Vec2 model from Hugging Face to transcribe the recorded speech into text.
  • AI Response Generation: Sends the transcribed text to Groq API (LLM) for generating a meaningful response.
  • Text-to-Speech (TTS): Converts the AI-generated response into speech using pyttsx3.
  • Interruptible Response: Users can stop the chatbot from speaking by pressing Enter.
  • Multi-threading: Ensures smooth execution of tasks like recording, processing, and speech output.

Project Structure

├── project/
│   ├── main.py                 # Main script for running the chatbot
│   ├── requirements.txt        # List of required Python dependencies
│   ├── .env                    # Stores API keys and environment variables
│   ├── output.wav              # Recorded audio file
│   ├── transcription.txt       # Transcribed text file
│   ├── README.md               # Project documentation

main.py

  • Handles audio recording using either PyAudio or SoundDevice.
  • Processes recorded audio and saves it in WAV format.
  • Uses a speech-to-text model to transcribe audio into text.
  • Interacts with the Groq API to generate AI-based responses.
  • Uses TTS to read the response aloud to the user.
  • Allows interruption via a keyboard event to stop speech playback.

requirements.txt

Contains all dependencies required for the project, such as:

pyaudio
sounddevice
scipy
wave
keyboard
pyttsx3
dotenv
transformers
groq

.env

Stores API keys and other environment variables.

GROQ_API_KEY=your_api_key_here

output.wav

The recorded audio file that is transcribed into text.

transcription.txt

Stores the transcribed text from the recorded audio.


Installation and Setup

1. Clone the Repository

git clone https://github.com/your-repo/voice-chatbot.git
cd voice-chatbot

2. Install Dependencies

pip install -r requirements.txt

3. Set Up Environment Variables

Create a .env file in the project directory and add your Groq API Key:

GROQ_API_KEY=your_api_key_here

4. Run the Chatbot

python main.py

Challenges and Design Decisions

  • PyAudio vs SoundDevice: Initially, PyAudio had installation issues in CS50.dev, so SoundDevice was used as an alternative. However, PyAudio provided better recording stability.
  • Interruptible Speech: A separate thread was created to allow users to stop the chatbot’s speech using Enter.
  • Groq API Integration: The chatbot requests a longer response from the AI model for more detailed answers.
  • Transcription Accuracy: Wav2Vec2 was chosen for its high accuracy in recognizing speech.

Future Improvements

  • Improve UI/UX with a web-based interface using Flask or React.
  • Add support for multiple languages in speech-to-text and text-to-speech.
  • Use Whisper AI for better transcription accuracy.
  • Optimize real-time response processing.

Conclusion

This project demonstrates how voice interaction can enhance AI-powered chatbots. By combining Speech Recognition, LLM-based text generation, and Text-to-Speech, the chatbot provides a seamless and intuitive user experience.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages