Is it possible to automate pareto-optimal data workflows? Possibly. Here's an attempt!
Welcome to the autonomous Machine Learning Wizard. This is first iteration of the problem, a working proof of concept if you will. The current UI is Streamlit application designed to guide you through the process of defining, assessing, and solving machine learning problems.
- Problem Definition: Clarify your machine learning problem with our Problem Definition Agent.
- Data Assessment: Evaluate your dataset's quality and suitability, and receive suggestions for preprocessing steps.
- Model Recommendation: Get expert recommendations on the most suitable machine learning models for your problem.
- Starter Code Generation: Get generated Python code for data handling, model definition, and a basic training loop.
- Define Your Machine Learning Problem: Describe the machine learning problem you are aiming to solve.
- Upload Your Dataset (Optional): Provide a sample of your data in .csv format for a more tailored assessment and recommendation.
- Receive Customized Guidance and Code: Based on your input and data, our AI agents will offer specific advice and generate starter Python code to kickstart your ML project.
Clone this repository to your local machine:
git clone <repository-url>
Navigate to the project directory:
cd <project-directory>
Install dependencies (feel free to use your favorite virtual env manager, I'm using conda for now but I prefer Poetry ):
pip install -r requirements.txt
Run the Streamlit application:
streamlit run main.py
To utilize the full capabilities of the AI-Powered Machine Learning Wizard, you'll need to provide an API key for the Langchain Groq models. Set up your secrets.toml
file under .streamlit
folder with your Groq API key as follows:
# streamlit.secrets example
GROQ_API_KEY = "your_api_key_here"
Upon launching the application, follow the on-screen instructions to navigate through the different stages of machine learning project development.
- Streamlit
- Pandas
- CrewAI: A framework built on top of Langchain for orchestrating AI agents to perform complex tasks. Prod usage might require custom code, but this project isn't there yet.
- Langchain Groq: Groq's superfast inference makes it a great fit for agentic workflows or CoT/ToT tasks!