Description
A system that indexes articles from the "1300 Towards Data Science Medium Articles" dataset available on Kaggle, applying Retrieval Augmented Generation (RAG) for the effective retrieval of article fragments in response to queries.
Features
- Retrieves predefined chunks of information based on user queries.
- Integrates with OpenAI's GPT-3.5 Turbo model to generate responses beyond basic retrieval.
- Chat-like interface for interacting with the system.
- Allows users to activate the RAG system by providing their OpenAI API key.
- Provides a button to gracefully shut down the system, closing the database connection and terminating the session.
Requirements
- Python 3.10 or higher
-
Clone the repository:
git clone https://github.com/bisd98/Article-Retrieval-System.git
-
Navigate to the project directory:
cd Article-Retrieval-System
-
Install dependencies using poetry:
poetry install --no-root
To run the system, first activate the virtual environment with command in the project directory:
poetry shell
Then use the following command:
streamlit run user_interface.py
After running the system using the command mentioned above, the following processes will be initiated:
- Chunking and Vectorization:
- The system will perform chunking and vectorization of data, breaking down input into manageable "chunks" and converting them into vector representations.
- Vector Database Initialization:
- Initialization of the vector database will take place, preparing it to store and retrieve vectorized data efficiently.
- Data Loading into Vector Database:
- Data will be loaded into the vector database, allowing for quick access during user interactions.
During the system setup, the user interface will display information about data loading. Once these processes are completed, interaction with the system will become available. The progress of each stage in setting up the system will be shown in the terminal output.
Ensure that you monitor the terminal output for the sequence and completion of these setup steps before interacting with the system through the user interface.
The interface resembles a chat with the system. The primary functionality involves retrieving chunks via the retrieval system.
Additionally, users can leverage the RAG system based on GPT-3.5 Turbo. To activate RAG, enter your OpenAI API key in the sidebar panel.
To deactivate system, click the 'Shut Down' button in the sidebar menu. This action closes the connection to the vector database and terminates the Streamlit session.
Notes
- Ensure you have a stable internet connection for using the RAG system, as it relies on the OpenAI API.
- For further assistance or inquiries, please refer to the project documentation or contact the developer.
License
This project is licensed under the MIT License. See the LICENSE file for more details.
Acknowledgments
This project was created by bisd98.