Refactor

neulab · Dec 6, 2024 · 3729878 · 3729878
1 parent d3a3133
commit 3729878
Show file tree

Hide file tree

Showing 366 changed files with 3,282 additions and 1,380 deletions.
diff --git a/Makefile b/Makefile
diff --git a/README.md b/README.md
@@ -34,32 +34,47 @@
 We made an analogy between data generators and teachers, where different generators teach student models using synthetic data in AgoraBench!
 
 
-# 🔧 Installation
+## 🔧 Installation
 
 Installation with pip:
 
 ```shell
 pip install data-agora
 ```
 
-# Project Structure 📁
+## Project Structure 📁
 
+### Root Directory
 ```
-agora/
-├── core/                   # Core framework components
-│   ├── llms/               # LLM implementations
-│   │   ├── base.py         # Abstract LLM interface
-│   │   ├── litellm.py      # LiteLLM integrationå
-│   │   ├── openai.py       # OpenAI API integration
-│   │   ├── test.py         # Test LLM implementation
-│   │   └── vllm.py         # vLLM integration (to be implemented)
-│   ├── parsers.py          # Parsing teacher model's output into instruction-response pairs
-│   ├── prompt_loaders.py   # Prompt preparation
-│   └── validators.py       # Validating the instruction-response pairs
-└── agora.py                # Main class orchestrating the pipeline
+.
+├── agora_scripts/           # Scripts for converting and handling data formats
+│   ├── prompts/            # Various prompt templates
+│   └── run.py             # Main execution script
+├── assets/                 # Project images and visual assets
+├── libs/                   # Core libraries
+│   └── data-agora/        # Main data processing library
+│       ├── data_agora/    # Core data agora implementation
+│       │   ├── core/      # Core functionality (LLMs, parsers, validators)
+├── train/                  # Training related code (based on llama-recipes)
+└── LICENSE
 ```
 
-# Usage Guide 🚀
+#### data-agora Library (`libs/data-agora/`)
+- Core implementation for data processing and handling
+- Includes LLM integrations (OpenAI, vLLM, etc.)
+- Parsers and validators for data processing
+- Serving capabilities for deployment
+
+#### Agora Scripts (`agora_scripts/`)
+- Tools for data format conversion
+- Collection of prompt templates for different use cases
+- Main execution script for running the pipeline
+
+#### Training (`train/`)
+- Based on Meta's [llama-recipes](https://github.com/meta-llama/llama-recipes/tree/main) repository
+- Contains training configurations and utilities
+
+## Usage Guide 🚀
 
 Our library is convenient for two types of audiences:
 1. **Testing an LM's Data Generation Capability with AgoraBench**: Using the pre-built pipeline, you can easily measure the data generation capabilities of different LLMs.
@@ -241,7 +256,7 @@ sampling_params = {
     "stop": placeholder_formats["stop_phrase"]
 }
 
-alchemy = Alchemy(
+agora = Agora(
     llm=llm,
     placeholder_formats=placeholder_formats,
     prompt_loader=prompt_loader,
@@ -251,7 +266,7 @@ alchemy = Alchemy(
 )
 
 # Use cache_file to resume from previous results: The Alchemy class will automatically make a cache file "final_result.jsonl" for example
-result = alchemy.run(num_instances=10000, num_threads=16, output_file="./results/final_result.json")
+result = agora.run(num_instances=10000, num_threads=16, output_file="./results/final_result.json")
 print(result[0])
 ```
 

diff --git a/libs/data-agora/.gitignore b/libs/data-agora/.gitignore
@@ -0,0 +1 @@
+uv.lock
diff --git a/libs/data-agora/Makefile b/libs/data-agora/Makefile
@@ -0,0 +1,11 @@
+.PHONY: style
+
+check_dirs := .
+
+style:
+	uv run isort $(check_dirs)
+	uv run black --line-length 119 --target-version py310 $(check_dirs)
+
+
+unittest:
+	uv run pytest -v tests/