Skip to content

Commit

Permalink
Add UI
Browse files Browse the repository at this point in the history
  • Loading branch information
aorwall committed Jan 26, 2025
1 parent c1d85b9 commit 98c589a
Show file tree
Hide file tree
Showing 61 changed files with 5,816 additions and 267 deletions.
11 changes: 6 additions & 5 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
MOATLESS_DIR=/tmp/moatless

REPO_DIR=/tmp/repos
INDEX_STORE_DIR=/tmp/moatless/index-store

DEFAULT_MODEL=gpt-4o-2024-05-13
CHEAP_MODEL=gpt-4o-mini-2024-07-18
VOYAGE_API_KEY=
OPENAI_API_KEY=
ANTHROPIC_API_KEY=

INDEX_STORE_DIR=/tmp/moatless/index-store
INDEX_STORE_URL="https://stmoatless.blob.core.windows.net/indexstore/20240522-voyage-code-2"
TESTBED_API_KEY=
#TESTBED_BASE_URL=https://testbeds.moatless.ai
109 changes: 76 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,51 +6,54 @@ _For the implementation used in the paper [SWE-Search: Enhancing Software Agents
## SWE-Bench
I use the [SWE-bench benchmark](https://www.swebench.com/) as a way to verify my ideas.

### Version 0.0.4: Deepseek V3
With version 0.0.4 I get 30.7% solve rate (92 instances) using the open-source Deepseek V3 model. The most notable aspect of this is the extremely low cost - the entire evaluation run costs less than $4 ($0.0127 per instance), achieving **24 resolved instances per dollar spent**.
* [Claude 3.5 Sonnet v20241022 evaluation results](https://experiments.moatless.ai/evaluations/20250113_claude_3_5_sonnet_20241022_temp_0_0_iter_20_fmt_tool_call_hist_messages_lite) - 39% solve rate, 2.7 resolved instances per dollar
* [Deepseek V3](https://experiments.moatless.ai/evaluations/20250111_deepseek_chat_v3_temp_0_0_iter_20_fmt_react_hist_react) - 30.7% solve rate, 24 resolved instances per dollar

* [Deepseek V3 evaluation results](https://experiments.moatless.ai/evaluations/20250111_deepseek_chat_v3_temp_0_0_iter_20_fmt_react_hist_react)
* [Claude 3.5 Sonnet v20241022 evaluation results](https://experiments.moatless.ai/evaluations/20250113_claude_3_5_sonnet_20241022_temp_0_0_iter_20_fmt_tool_call_hist_messages_lite)
# Try it out

### Version 0.0.3: Claude 3.5 Sonnet v20241022
With version 0.0.3 I get 38.3% solve rate with Claude 3.5 Sonnet v20241022. Average cost per instance is $0.30.
## Environment Setup

The three main reasons I've been able to go from 27% to 38% solved instances in this version:
You can install Moatless Tools either from PyPI or from source:

- **Claude 3.5 Sonnet and Computer Use**
The solution has been adjusted to use the `text_editor_20241022` tool introduced in the new version of Claude 3.5 Sonnet. This provides more stable results when editing existing code.
### Install from PyPI

- **[moatless-testbeds](https://github.com/aorwall/moatless-testbeds)**
I set up a Kubernetes-based solution to run tests and provide feedback on test results to the agent. It's worth noting that the agent has to independently identify the tests and can't rely on the `PASS_TO_PASS` or `FAIL_TO_PASS` data for each instance.
```bash
# Install base package only
pip install moatless

- **More flexible model**
In the earlier version of Moatless Tools, the agent followed a rigid flow where it first retrieved content and then edited the code. Now, it can dynamically choose between actions for code retrieval or editing, depending on the situation.
# Install with streamlit visualization tools
pip install "moatless[streamlit]"

[Try the Claude 3.5 Sonnet v20241022 evaluation set up on Google Colab](https://colab.research.google.com/drive/1yOCXhTujvX4QIGJuO73UIVVqAqgwlhmC?usp=sharing)
# Install with API server
pip install "moatless[api]"

# Install everything (including dev dependencies)
pip install "moatless[all]"
```

### Version 0.0.2: Claude 3.5 Sonnet
With version 0.0.2 I get 26.7% solve rate with Claude 3.5 Sonnet, with a bit higher cost of $0.17 per instance.
### Install from source

[Try the Claude 3.5 evaluation set up on Google Colab](https://colab.research.google.com/drive/1pKecc3pumsrOGzTOOCEqjRKzeCWLWQpj?usp=sharing)
Clone the repository and install using Poetry:

### Version 0.0.1: GPT-4o
Moatless Tools 0.0.1 has a solve rate of 24%, with each benchmark instance costing an average of $0.13 to solve with GPT-4o. Running the SWE Bench Lite dataset with 300 instances costs approx 40 dollars.
```bash
# Clone the repository
git clone https://github.com/aorwall/moatless-tools.git
cd moatless-tools

[Try it out in Google Colab](https://colab.research.google.com/drive/15RpSjdprf9lcaP0oqKsuYfZl1c3kVB_t?usp=sharing)
# Using Poetry:

# Install base package only
poetry install

# Try it out
I have focused on testing my ideas, and the project is currently a bit messy. My plan is to organize it in the coming period. However, feel free to clone the repo and try running this notebook:
# Install with streamlit visualization tools
poetry install --with streamlit

1. [Run Moatless Tools on any repository](notebooks/00_index_and_run.ipynb)
# Install with API server
poetry install --with api

## Environment Setup
# Alternative: Install all optional components at once
poetry install --all-extras

Install dependencies:
```bash
poetry install
```

## Environment Variables

Expand All @@ -64,8 +67,14 @@ You can configure these settings by either:
1. Create a `.env` file in the project root (copy from `.env.example`):
```bash
# Using Poetry:
cp .env.example .env
# Edit .env with your values
# Using pip:
curl -O https://raw.githubusercontent.com/aorwall/moatless-tools/main/.env.example
mv .env.example .env
# Edit .env with your values
```
2. Or export the variables directly:
Expand All @@ -74,7 +83,7 @@ cp .env.example .env
# Directory for storing vector index store files
export INDEX_STORE_DIR="/tmp/index_store"
# Directory for storing clonedrepositories
# Directory for storing cloned repositories
export REPO_DIR="/tmp/repos"
# Required: At least one LLM provider API key
Expand Down Expand Up @@ -125,7 +134,7 @@ Before running the full evaluation, you can verify your setup using the integrat
```bash
# Run a single model test
poetry run python -m moatless.validation.validate_simple_code_flow --model claude-3-5-sonnet-20241022
python -m moatless.validation.validate_simple_code_flow --model claude-3-5-sonnet-20241022
```
The script will run the model against a sample SWE-Bench instance
Expand All @@ -138,7 +147,7 @@ Results are saved in `test_results/integration_test_<timestamp>/` .
The evaluation script supports various configuration options through command line arguments:
```bash
poetry run python -m moatless.benchmark.run_evaluation [OPTIONS]
python -m moatless.benchmark.run_evaluation [OPTIONS]
```
Required arguments:
Expand Down Expand Up @@ -179,18 +188,52 @@ Available dataset splits that can be specified with the `--split` argument:
Example usage:
```bash
# Run evaluation with Claude 3.5 Sonnet using the ReACT format
poetry run python -m moatless.benchmark.run_evaluation \
python -m moatless.benchmark.run_evaluation \
--model claude-3-5-sonnet-20241022 \
--response-format react \
--message-history react \
--num-workers 10
# Run specific instances with GPT-4
poetry run python -m moatless.benchmark.run_evaluation \
python -m moatless.benchmark.run_evaluation \
--model gpt-4o-2024-11-20 \
--instance-ids "django__django-16527"
```
# Running the UI and API
The project includes a web UI for visualizing saved trajectory files, built with SvelteKit.
First, make sure you have the required components installed:
```bash
# Install from PyPI:
pip install "moatless[api]"
# Or if installing from source:
# Using Poetry:
poetry install --with api
```
### Start the API Server
```bash
# If installed from PyPI or using pip:
python -m moatless.api
# If using Poetry:
poetry run moatless-api
```
This will start the FastAPI server on http://localhost:8000.
### Start the UI Development Server
```bash
# From the ui directory
cd ui
pnpm install
pnpm run dev
```
The UI will be available at http://localhost:5173. Currently, it provides a view for exploring saved trajectory files.
# Code Examples
Basic setup using the `AgenticLoop` to solve a SWE-Bench instance.
Expand Down
2 changes: 1 addition & 1 deletion moatless/actions/claude_text_editor.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ def create_args(self):
file_text=self.file_text,
thoughts=self.thoughts,
)

return None

@model_validator(mode="after")
Expand Down
11 changes: 6 additions & 5 deletions moatless/actions/string_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,13 +82,14 @@ def remove_line_numbers(text: str) -> str:
self.new_str = remove_line_numbers(self.new_str.rstrip("\n"))

return self

@field_validator("new_str")
@classmethod
def validate_new_str(cls, v):
if v is None:
raise ValueError("Parameter `new_str` cannot be null. Return an empty string if your intention was to remove old_str."
)
raise ValueError(
"Parameter `new_str` cannot be null. Return an empty string if your intention was to remove old_str."
)
return v

def format_args_for_llm(self) -> str:
Expand Down Expand Up @@ -665,10 +666,10 @@ def find_exact_matches(old_str: str, file_content: str) -> list[dict]:
line_end = file_content.find("\n", start_pos)
if line_end == -1: # Handle last line
line_end = len(file_content)

# Get the full line from the file
full_line = file_content[line_start:line_end]

# Skip if old_str is only a part of a larger line
if full_line != old_str:
start_pos += 1
Expand Down
4 changes: 3 additions & 1 deletion moatless/agent/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,9 @@ def _execute(self, node: Node, action_step: ActionStep):
raise RuntimeError(f"Action {type(node.action)} not found in action map.")

try:
action_step.observation = action.execute(action_step.action, file_context=node.file_context, workspace=node.workspace)
action_step.observation = action.execute(
action_step.action, file_context=node.file_context, workspace=node.workspace
)
if not action_step.observation:
logger.warning(f"Node{node.node_id}: Action {action_step.action.name} returned no observation")
else:
Expand Down
16 changes: 8 additions & 8 deletions moatless/agent/code_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,27 +61,27 @@ def create(
if completion_model is None:
if model is None:
raise ValueError("Either completion_model or model name must be provided")

# Get default config for the model from model_config
model_config = get_model_config(model)

# Set instance variables from model config if not explicitly provided
if thoughts_in_action is None:
thoughts_in_action = model_config.get('thoughts_in_action', False)
thoughts_in_action = model_config.get("thoughts_in_action", False)
if disable_thoughts is None:
disable_thoughts = model_config.get('disable_thoughts', False)
disable_thoughts = model_config.get("disable_thoughts", False)
if few_shot_examples is None:
few_shot_examples = model_config.get('few_shot_examples', True)
few_shot_examples = model_config.get("few_shot_examples", True)

# Override with any provided kwargs
model_config.update(kwargs)

# Create completion model
completion_model = BaseCompletionModel.create(**model_config)
else:
# Clone the completion model to ensure we have our own instance
completion_model = completion_model.clone()

# Set instance variables from completion model if not explicitly provided
if thoughts_in_action is None:
thoughts_in_action = completion_model.thoughts_in_action
Expand Down
23 changes: 23 additions & 0 deletions moatless/api/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import logging
from dotenv import load_dotenv
import moatless.api.api
import uvicorn


def run_api():
"""Run the Moatless API server"""
# Load environment variables from .env file
load_dotenv()

# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)

# Create and run API without workspace
api = moatless.api.api.create_api()
logger.info("Starting API server")
uvicorn.run(api, host="0.0.0.0", port=8000)


if __name__ == "__main__":
run_api()
78 changes: 78 additions & 0 deletions moatless/api/api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
from fastapi import FastAPI, HTTPException, UploadFile
from typing import List, Dict, Any
from fastapi.middleware.cors import CORSMiddleware
from dotenv import load_dotenv
import json
import logging
from moatless.workspace import Workspace
from moatless.artifacts.artifact import ArtifactListItem
from moatless.api.schema import TrajectoryDTO
from moatless.api.trajectory_utils import load_trajectory_from_file, create_trajectory_dto


logger = logging.getLogger(__name__)


def create_api(workspace: Workspace | None = None) -> FastAPI:
"""Create and initialize the API with an optional workspace"""
api = FastAPI(title="Moatless API")

# Add CORS middleware with proper configuration
origins = [
"http://localhost:5173", # SvelteKit dev server
"http://127.0.0.1:5173", # Alternative local dev URL (IPv4)
"http://[::1]:5173", # Alternative local dev URL (IPv6)
]

api.add_middleware(
CORSMiddleware,
allow_origins=origins,
allow_credentials=True,
allow_methods=["GET", "POST", "PUT", "DELETE", "OPTIONS"],
allow_headers=["*"],
max_age=3600, # Cache preflight requests for 1 hour
)

if workspace is not None:

@api.get("/artifacts", response_model=List[ArtifactListItem])
async def list_all_artifacts():
"""Get all artifacts across all types"""
return workspace.get_all_artifacts()

@api.get("/artifacts/{type}", response_model=List[ArtifactListItem])
async def list_artifacts(type: str):
try:
return workspace.get_artifacts_by_type(type)
except ValueError as e:
raise HTTPException(status_code=404, detail=str(e))

@api.get("/artifacts/{type}/{id}", response_model=Dict[str, Any])
async def get_artifact(type: str, id: str):
try:
artifact = workspace.get_artifact(type, id)
if not artifact:
raise HTTPException(status_code=404, detail=f"Artifact {id} not found")
return artifact.to_ui_representation()
except ValueError as e:
raise HTTPException(status_code=404, detail=str(e))

@api.get("/trajectory", response_model=TrajectoryDTO)
async def get_trajectory(file_path: str):
"""Get trajectory data from a file path"""
try:
return load_trajectory_from_file(file_path)
except ValueError as e:
raise HTTPException(status_code=404, detail=str(e))

@api.post("/trajectory/upload", response_model=TrajectoryDTO)
async def upload_trajectory(file: UploadFile):
"""Upload and process a trajectory file"""
try:
content = await file.read()
trajectory_data = json.loads(content.decode())
return create_trajectory_dto(trajectory_data)
except Exception as e:
raise HTTPException(status_code=400, detail=f"Invalid trajectory file: {str(e)}")

return api
Loading

0 comments on commit 98c589a

Please sign in to comment.