Skip to content

Commit

Permalink
Refactor
Browse files Browse the repository at this point in the history
  • Loading branch information
scottsuk0306 committed Dec 6, 2024
1 parent d3a3133 commit 3729878
Show file tree
Hide file tree
Showing 366 changed files with 3,282 additions and 1,380 deletions.
16 changes: 0 additions & 16 deletions Makefile

This file was deleted.

49 changes: 32 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,32 +34,47 @@
We made an analogy between data generators and teachers, where different generators teach student models using synthetic data in AgoraBench!


# 🔧 Installation
## 🔧 Installation

Installation with pip:

```shell
pip install data-agora
```

# Project Structure 📁
## Project Structure 📁

### Root Directory
```
agora/
├── core/ # Core framework components
│ ├── llms/ # LLM implementations
│ │ ├── base.py # Abstract LLM interface
│ │ ├── litellm.py # LiteLLM integrationå
│ │ ├── openai.py # OpenAI API integration
│ │ ├── test.py # Test LLM implementation
│ │ └── vllm.py # vLLM integration (to be implemented)
│ ├── parsers.py # Parsing teacher model's output into instruction-response pairs
│ ├── prompt_loaders.py # Prompt preparation
│ └── validators.py # Validating the instruction-response pairs
└── agora.py # Main class orchestrating the pipeline
.
├── agora_scripts/ # Scripts for converting and handling data formats
│ ├── prompts/ # Various prompt templates
│ └── run.py # Main execution script
├── assets/ # Project images and visual assets
├── libs/ # Core libraries
│ └── data-agora/ # Main data processing library
│ ├── data_agora/ # Core data agora implementation
│ │ ├── core/ # Core functionality (LLMs, parsers, validators)
├── train/ # Training related code (based on llama-recipes)
└── LICENSE
```

# Usage Guide 🚀
#### data-agora Library (`libs/data-agora/`)
- Core implementation for data processing and handling
- Includes LLM integrations (OpenAI, vLLM, etc.)
- Parsers and validators for data processing
- Serving capabilities for deployment

#### Agora Scripts (`agora_scripts/`)
- Tools for data format conversion
- Collection of prompt templates for different use cases
- Main execution script for running the pipeline

#### Training (`train/`)
- Based on Meta's [llama-recipes](https://github.com/meta-llama/llama-recipes/tree/main) repository
- Contains training configurations and utilities

## Usage Guide 🚀

Our library is convenient for two types of audiences:
1. **Testing an LM's Data Generation Capability with AgoraBench**: Using the pre-built pipeline, you can easily measure the data generation capabilities of different LLMs.
Expand Down Expand Up @@ -241,7 +256,7 @@ sampling_params = {
"stop": placeholder_formats["stop_phrase"]
}

alchemy = Alchemy(
agora = Agora(
llm=llm,
placeholder_formats=placeholder_formats,
prompt_loader=prompt_loader,
Expand All @@ -251,7 +266,7 @@ alchemy = Alchemy(
)

# Use cache_file to resume from previous results: The Alchemy class will automatically make a cache file "final_result.jsonl" for example
result = alchemy.run(num_instances=10000, num_threads=16, output_file="./results/final_result.json")
result = agora.run(num_instances=10000, num_threads=16, output_file="./results/final_result.json")
print(result[0])
```

Expand Down
1 change: 1 addition & 0 deletions libs/data-agora/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
uv.lock
11 changes: 11 additions & 0 deletions libs/data-agora/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.PHONY: style

check_dirs := .

style:
uv run isort $(check_dirs)
uv run black --line-length 119 --target-version py310 $(check_dirs)


unittest:
uv run pytest -v tests/
Loading

0 comments on commit 3729878

Please sign in to comment.