This project is a Voice-Powered AI Chatbot that allows users to interact with an AI assistant through voice input. The chatbot records the user's speech, transcribes it into text, sends it to a large language model (LLM) via the Groq API, and then speaks out the response using Text-to-Speech (TTS).
- Real-time Audio Recording: Captures user's voice input using PyAudio (or an alternative like SoundDevice if necessary).
- Speech-to-Text (ASR): Uses a pre-trained Wav2Vec2 model from Hugging Face to transcribe the recorded speech into text.
- AI Response Generation: Sends the transcribed text to Groq API (LLM) for generating a meaningful response.
- Text-to-Speech (TTS): Converts the AI-generated response into speech using pyttsx3.
- Interruptible Response: Users can stop the chatbot from speaking by pressing Enter.
- Multi-threading: Ensures smooth execution of tasks like recording, processing, and speech output.
├── project/
│ ├── main.py # Main script for running the chatbot
│ ├── requirements.txt # List of required Python dependencies
│ ├── .env # Stores API keys and environment variables
│ ├── output.wav # Recorded audio file
│ ├── transcription.txt # Transcribed text file
│ ├── README.md # Project documentation
- Handles audio recording using either PyAudio or SoundDevice.
- Processes recorded audio and saves it in WAV format.
- Uses a speech-to-text model to transcribe audio into text.
- Interacts with the Groq API to generate AI-based responses.
- Uses TTS to read the response aloud to the user.
- Allows interruption via a keyboard event to stop speech playback.
Contains all dependencies required for the project, such as:
pyaudio
sounddevice
scipy
wave
keyboard
pyttsx3
dotenv
transformers
groq
Stores API keys and other environment variables.
GROQ_API_KEY=your_api_key_here
The recorded audio file that is transcribed into text.
Stores the transcribed text from the recorded audio.
git clone https://github.com/your-repo/voice-chatbot.git
cd voice-chatbot
pip install -r requirements.txt
Create a .env
file in the project directory and add your Groq API Key:
GROQ_API_KEY=your_api_key_here
python main.py
- PyAudio vs SoundDevice: Initially, PyAudio had installation issues in CS50.dev, so SoundDevice was used as an alternative. However, PyAudio provided better recording stability.
- Interruptible Speech: A separate thread was created to allow users to stop the chatbot’s speech using Enter.
- Groq API Integration: The chatbot requests a longer response from the AI model for more detailed answers.
- Transcription Accuracy: Wav2Vec2 was chosen for its high accuracy in recognizing speech.
- Improve UI/UX with a web-based interface using Flask or React.
- Add support for multiple languages in speech-to-text and text-to-speech.
- Use Whisper AI for better transcription accuracy.
- Optimize real-time response processing.
This project demonstrates how voice interaction can enhance AI-powered chatbots. By combining Speech Recognition, LLM-based text generation, and Text-to-Speech, the chatbot provides a seamless and intuitive user experience.