Add UI

aorwall · Jan 26, 2025 · 98c589a · 98c589a
1 parent c1d85b9
commit 98c589a
Show file tree

Hide file tree

Showing 61 changed files with 5,816 additions and 267 deletions.
diff --git a/.env.example b/.env.example
@@ -1,9 +1,10 @@
 MOATLESS_DIR=/tmp/moatless
-
 REPO_DIR=/tmp/repos
+INDEX_STORE_DIR=/tmp/moatless/index-store
 
-DEFAULT_MODEL=gpt-4o-2024-05-13
-CHEAP_MODEL=gpt-4o-mini-2024-07-18
+VOYAGE_API_KEY=
+OPENAI_API_KEY=
+ANTHROPIC_API_KEY=
 
-INDEX_STORE_DIR=/tmp/moatless/index-store
-INDEX_STORE_URL="https://stmoatless.blob.core.windows.net/indexstore/20240522-voyage-code-2"
+TESTBED_API_KEY=
+#TESTBED_BASE_URL=https://testbeds.moatless.ai
diff --git a/README.md b/README.md
@@ -6,51 +6,54 @@ _For the implementation used in the paper [SWE-Search: Enhancing Software Agents
 ## SWE-Bench
 I use the [SWE-bench benchmark](https://www.swebench.com/) as a way to verify my ideas. 
 
-### Version 0.0.4: Deepseek V3
-With version 0.0.4 I get 30.7% solve rate (92 instances) using the open-source Deepseek V3 model. The most notable aspect of this is the extremely low cost - the entire evaluation run costs less than $4 ($0.0127 per instance), achieving **24 resolved instances per dollar spent**.
+* [Claude 3.5 Sonnet v20241022 evaluation results](https://experiments.moatless.ai/evaluations/20250113_claude_3_5_sonnet_20241022_temp_0_0_iter_20_fmt_tool_call_hist_messages_lite) - 39% solve rate, 2.7 resolved instances per dollar
+* [Deepseek V3](https://experiments.moatless.ai/evaluations/20250111_deepseek_chat_v3_temp_0_0_iter_20_fmt_react_hist_react) - 30.7% solve rate, 24 resolved instances per dollar
 
-* [Deepseek V3 evaluation results](https://experiments.moatless.ai/evaluations/20250111_deepseek_chat_v3_temp_0_0_iter_20_fmt_react_hist_react)  
-* [Claude 3.5 Sonnet v20241022 evaluation results](https://experiments.moatless.ai/evaluations/20250113_claude_3_5_sonnet_20241022_temp_0_0_iter_20_fmt_tool_call_hist_messages_lite)
+# Try it out
 
-### Version 0.0.3: Claude 3.5 Sonnet v20241022
-With version 0.0.3 I get 38.3% solve rate with Claude 3.5 Sonnet v20241022. Average cost per instance is $0.30.
+## Environment Setup
 
-The three main reasons I've been able to go from 27% to 38% solved instances in this version:
+You can install Moatless Tools either from PyPI or from source:
 
-- **Claude 3.5 Sonnet and Computer Use**  
-  The solution has been adjusted to use the `text_editor_20241022` tool introduced in the new version of Claude 3.5 Sonnet. This provides more stable results when editing existing code.  
+### Install from PyPI
 
-- **[moatless-testbeds](https://github.com/aorwall/moatless-testbeds)**  
-  I set up a Kubernetes-based solution to run tests and provide feedback on test results to the agent. It's worth noting that the agent has to independently identify the tests and can't rely on the `PASS_TO_PASS` or `FAIL_TO_PASS` data for each instance.  
+```bash
+# Install base package only
+pip install moatless
 
-- **More flexible model**  
-  In the earlier version of Moatless Tools, the agent followed a rigid flow where it first retrieved content and then edited the code. Now, it can dynamically choose between actions for code retrieval or editing, depending on the situation.
+# Install with streamlit visualization tools
+pip install "moatless[streamlit]"
 
-[Try the Claude 3.5 Sonnet v20241022 evaluation set up on Google Colab](https://colab.research.google.com/drive/1yOCXhTujvX4QIGJuO73UIVVqAqgwlhmC?usp=sharing)
+# Install with API server
+pip install "moatless[api]"
 
+# Install everything (including dev dependencies)
+pip install "moatless[all]"
+```
 
-### Version 0.0.2: Claude 3.5 Sonnet
-With version 0.0.2 I get 26.7% solve rate with Claude 3.5 Sonnet, with a bit higher cost of $0.17 per instance. 
+### Install from source
 
-[Try the Claude 3.5 evaluation set up on Google Colab](https://colab.research.google.com/drive/1pKecc3pumsrOGzTOOCEqjRKzeCWLWQpj?usp=sharing)
+Clone the repository and install using Poetry:
 
-### Version 0.0.1: GPT-4o
-Moatless Tools 0.0.1 has a solve rate of 24%, with each benchmark instance costing an average of $0.13 to solve with GPT-4o. Running the SWE Bench Lite dataset with 300 instances costs approx 40 dollars. 
+```bash
+# Clone the repository
+git clone https://github.com/aorwall/moatless-tools.git
+cd moatless-tools
 
-[Try it out in Google Colab](https://colab.research.google.com/drive/15RpSjdprf9lcaP0oqKsuYfZl1c3kVB_t?usp=sharing)
+# Using Poetry:
 
+# Install base package only
+poetry install
 
-# Try it out
-I have focused on testing my ideas, and the project is currently a bit messy. My plan is to organize it in the coming period. However, feel free to clone the repo and try running this notebook:
+# Install with streamlit visualization tools
+poetry install --with streamlit
 
-1. [Run Moatless Tools on any repository](notebooks/00_index_and_run.ipynb)
+# Install with API server
+poetry install --with api
 
-## Environment Setup
+# Alternative: Install all optional components at once
+poetry install --all-extras
 
-Install dependencies:
-```bash
-poetry install
-```
 
 ## Environment Variables
 
@@ -64,8 +67,14 @@ You can configure these settings by either:
 1. Create a `.env` file in the project root (copy from `.env.example`):
 
 ```bash
+# Using Poetry:
 cp .env.example .env
 # Edit .env with your values
+
+# Using pip:
+curl -O https://raw.githubusercontent.com/aorwall/moatless-tools/main/.env.example
+mv .env.example .env
+# Edit .env with your values
 ```
 
 2. Or export the variables directly:
@@ -74,7 +83,7 @@ cp .env.example .env
 # Directory for storing vector index store files  
 export INDEX_STORE_DIR="/tmp/index_store"    
 
-# Directory for storing clonedrepositories 
+# Directory for storing cloned repositories 
 export REPO_DIR="/tmp/repos"
 
 # Required: At least one LLM provider API key
@@ -125,7 +134,7 @@ Before running the full evaluation, you can verify your setup using the integrat
 
 ```bash
 # Run a single model test
-poetry run python -m moatless.validation.validate_simple_code_flow --model claude-3-5-sonnet-20241022
+python -m moatless.validation.validate_simple_code_flow --model claude-3-5-sonnet-20241022
 ```
 
 The script will run the model against a sample SWE-Bench instance
@@ -138,7 +147,7 @@ Results are saved in `test_results/integration_test_<timestamp>/` .
 The evaluation script supports various configuration options through command line arguments:
 
 ```bash
-poetry run python -m moatless.benchmark.run_evaluation [OPTIONS]
+python -m moatless.benchmark.run_evaluation [OPTIONS]
 ```
 
 Required arguments:
@@ -179,18 +188,52 @@ Available dataset splits that can be specified with the `--split` argument:
 Example usage:
 ```bash
 # Run evaluation with Claude 3.5 Sonnet using the ReACT format
-poetry run python -m moatless.benchmark.run_evaluation \
+python -m moatless.benchmark.run_evaluation \
   --model claude-3-5-sonnet-20241022 \
   --response-format react \
   --message-history react \
   --num-workers 10
 
 # Run specific instances with GPT-4
-poetry run python -m moatless.benchmark.run_evaluation \
+python -m moatless.benchmark.run_evaluation \
   --model gpt-4o-2024-11-20 \
   --instance-ids "django__django-16527"
 ```
 
+# Running the UI and API
+
+The project includes a web UI for visualizing saved trajectory files, built with SvelteKit.
+
+First, make sure you have the required components installed:
+```bash
+# Install from PyPI:
+pip install "moatless[api]"
+
+# Or if installing from source:
+# Using Poetry:
+poetry install --with api
+```
+
+### Start the API Server
+```bash
+# If installed from PyPI or using pip:
+python -m moatless.api
+
+# If using Poetry:
+poetry run moatless-api
+```
+This will start the FastAPI server on http://localhost:8000.
+
+### Start the UI Development Server
+```bash
+# From the ui directory
+cd ui
+pnpm install
+pnpm run dev
+```
+The UI will be available at http://localhost:5173. Currently, it provides a view for exploring saved trajectory files.
+
+
 # Code Examples
 
 Basic setup using the `AgenticLoop` to solve a SWE-Bench instance.

diff --git a/moatless/actions/claude_text_editor.py b/moatless/actions/claude_text_editor.py
@@ -115,7 +115,7 @@ def create_args(self):
                 file_text=self.file_text,
                 thoughts=self.thoughts,
             )
-        
+
         return None
 
     @model_validator(mode="after")

diff --git a/moatless/actions/string_replace.py b/moatless/actions/string_replace.py
@@ -82,13 +82,14 @@ def remove_line_numbers(text: str) -> str:
         self.new_str = remove_line_numbers(self.new_str.rstrip("\n"))
 
         return self
-    
+
     @field_validator("new_str")
     @classmethod
     def validate_new_str(cls, v):
         if v is None:
-            raise ValueError("Parameter `new_str` cannot be null. Return an empty string if your intention was to remove old_str."
-)
+            raise ValueError(
+                "Parameter `new_str` cannot be null. Return an empty string if your intention was to remove old_str."
+            )
         return v
 
     def format_args_for_llm(self) -> str:
@@ -665,10 +666,10 @@ def find_exact_matches(old_str: str, file_content: str) -> list[dict]:
             line_end = file_content.find("\n", start_pos)
             if line_end == -1:  # Handle last line
                 line_end = len(file_content)
-            
+
             # Get the full line from the file
             full_line = file_content[line_start:line_end]
-            
+
             # Skip if old_str is only a part of a larger line
             if full_line != old_str:
                 start_pos += 1

diff --git a/moatless/agent/agent.py b/moatless/agent/agent.py
@@ -155,7 +155,9 @@ def _execute(self, node: Node, action_step: ActionStep):
             raise RuntimeError(f"Action {type(node.action)} not found in action map.")
 
         try:
-            action_step.observation = action.execute(action_step.action, file_context=node.file_context, workspace=node.workspace)
+            action_step.observation = action.execute(
+                action_step.action, file_context=node.file_context, workspace=node.workspace
+            )
             if not action_step.observation:
                 logger.warning(f"Node{node.node_id}: Action {action_step.action.name} returned no observation")
             else:

diff --git a/moatless/agent/code_agent.py b/moatless/agent/code_agent.py
@@ -61,27 +61,27 @@ def create(
         if completion_model is None:
             if model is None:
                 raise ValueError("Either completion_model or model name must be provided")
-            
+
             # Get default config for the model from model_config
             model_config = get_model_config(model)
-            
+
             # Set instance variables from model config if not explicitly provided
             if thoughts_in_action is None:
-                thoughts_in_action = model_config.get('thoughts_in_action', False)
+                thoughts_in_action = model_config.get("thoughts_in_action", False)
             if disable_thoughts is None:
-                disable_thoughts = model_config.get('disable_thoughts', False)
+                disable_thoughts = model_config.get("disable_thoughts", False)
             if few_shot_examples is None:
-                few_shot_examples = model_config.get('few_shot_examples', True)
-            
+                few_shot_examples = model_config.get("few_shot_examples", True)
+
             # Override with any provided kwargs
             model_config.update(kwargs)
-            
+
             # Create completion model
             completion_model = BaseCompletionModel.create(**model_config)
         else:
             # Clone the completion model to ensure we have our own instance
             completion_model = completion_model.clone()
-            
+
             # Set instance variables from completion model if not explicitly provided
             if thoughts_in_action is None:
                 thoughts_in_action = completion_model.thoughts_in_action

diff --git a/moatless/api/__init__.py b/moatless/api/__init__.py
@@ -0,0 +1,23 @@
+import logging
+from dotenv import load_dotenv
+import moatless.api.api
+import uvicorn
+
+
+def run_api():
+    """Run the Moatless API server"""
+    # Load environment variables from .env file
+    load_dotenv()
+
+    # Configure logging
+    logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
+    logger = logging.getLogger(__name__)
+
+    # Create and run API without workspace
+    api = moatless.api.api.create_api()
+    logger.info("Starting API server")
+    uvicorn.run(api, host="0.0.0.0", port=8000)
+
+
+if __name__ == "__main__":
+    run_api()
diff --git a/moatless/api/api.py b/moatless/api/api.py
@@ -0,0 +1,78 @@
+from fastapi import FastAPI, HTTPException, UploadFile
+from typing import List, Dict, Any
+from fastapi.middleware.cors import CORSMiddleware
+from dotenv import load_dotenv
+import json
+import logging
+from moatless.workspace import Workspace
+from moatless.artifacts.artifact import ArtifactListItem
+from moatless.api.schema import TrajectoryDTO
+from moatless.api.trajectory_utils import load_trajectory_from_file, create_trajectory_dto
+
+
+logger = logging.getLogger(__name__)
+
+
+def create_api(workspace: Workspace | None = None) -> FastAPI:
+    """Create and initialize the API with an optional workspace"""
+    api = FastAPI(title="Moatless API")
+
+    # Add CORS middleware with proper configuration
+    origins = [
+        "http://localhost:5173",  # SvelteKit dev server
+        "http://127.0.0.1:5173",  # Alternative local dev URL (IPv4)
+        "http://[::1]:5173",  # Alternative local dev URL (IPv6)
+    ]
+
+    api.add_middleware(
+        CORSMiddleware,
+        allow_origins=origins,
+        allow_credentials=True,
+        allow_methods=["GET", "POST", "PUT", "DELETE", "OPTIONS"],
+        allow_headers=["*"],
+        max_age=3600,  # Cache preflight requests for 1 hour
+    )
+
+    if workspace is not None:
+
+        @api.get("/artifacts", response_model=List[ArtifactListItem])
+        async def list_all_artifacts():
+            """Get all artifacts across all types"""
+            return workspace.get_all_artifacts()
+
+        @api.get("/artifacts/{type}", response_model=List[ArtifactListItem])
+        async def list_artifacts(type: str):
+            try:
+                return workspace.get_artifacts_by_type(type)
+            except ValueError as e:
+                raise HTTPException(status_code=404, detail=str(e))
+
+        @api.get("/artifacts/{type}/{id}", response_model=Dict[str, Any])
+        async def get_artifact(type: str, id: str):
+            try:
+                artifact = workspace.get_artifact(type, id)
+                if not artifact:
+                    raise HTTPException(status_code=404, detail=f"Artifact {id} not found")
+                return artifact.to_ui_representation()
+            except ValueError as e:
+                raise HTTPException(status_code=404, detail=str(e))
+
+    @api.get("/trajectory", response_model=TrajectoryDTO)
+    async def get_trajectory(file_path: str):
+        """Get trajectory data from a file path"""
+        try:
+            return load_trajectory_from_file(file_path)
+        except ValueError as e:
+            raise HTTPException(status_code=404, detail=str(e))
+
+    @api.post("/trajectory/upload", response_model=TrajectoryDTO)
+    async def upload_trajectory(file: UploadFile):
+        """Upload and process a trajectory file"""
+        try:
+            content = await file.read()
+            trajectory_data = json.loads(content.decode())
+            return create_trajectory_dto(trajectory_data)
+        except Exception as e:
+            raise HTTPException(status_code=400, detail=f"Invalid trajectory file: {str(e)}")
+
+    return api