Skip to content

Commit

Permalink
Merge branch 'main' into harrison/tracing-tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
agola11 authored May 3, 2024
2 parents 78667e1 + b0c5a95 commit 66af49b
Show file tree
Hide file tree
Showing 100 changed files with 3,148 additions and 160 deletions.
4 changes: 4 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"editor.trimAutoWhitespace": false,
"files.trimTrailingWhitespaceInRegexAndStrings": false
}
2 changes: 1 addition & 1 deletion docs/evaluation/faq/custom-evaluators.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ With function calling, it has become easier than ever to generate feedback metri
Below is an example (in this case using OpenAI's tool calling functionality) to evaluate RAG app faithfulness.

````python
iimport json
import json
from typing import List

import openai
Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation/faq/experiments-app.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Run Experiments in Browser (no code)
sidebar_position: 7
sidebar_position: 8
---

# How to run experiments in the prompt playground (no code)
Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation/faq/manage-datasets.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Manage Datasets
sidebar_position: 4
sidebar_position: 5
---

import {
Expand Down
40 changes: 40 additions & 0 deletions docs/evaluation/faq/regression-testing.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
sidebar_label: Regression Testing
sidebar_position: 3
---

# Regression Testing

When evaluating LLM applications, it is important to be able to track how your system performs over time. In this guide, we will show you how to use LangSmith's comparison view in
order to track regressions in your application, and drill down to inspect the specific runs that improved/regressed over time.

## Overview

In the LangSmith comparison view, runs that _regressed_ on your specified feedback key against your baseline experiment will be highlighted in red, while runs that _improved_
will be highlighted in green. At the top of each column, you can see how many runs in that experiment did better and and how many did worse than your baseline experiment.

![Regressions](../static/regression_view.png)

## Baseline Experiment

In order to track regressions, you need a baseline experiment against which to compare. This will be automatically assigned as the first experiment in your comparison, but you can
change it from the dropdown at the top of the page.

![Baseline](../static/select_baseline.png)

## Select Feedback Key

You will also want to select the feedback key on which you would like focus. This can be selected via another dropdown at the top. Again, one will be assigned by
default, but you can adjust as needed.

![Feedback](../static/select_feedback.png)

## Filter to Regressions or Improvements

Click on the regressions or improvements buttons on the top of each column to filter to the runs that regressed or improved in that specific experiment.

![Regressions Filter](../static/filter_to_regressions.png)

## Try it out

To get started with regression testing, try [running a no-code experiment in our prompt playground](experiments-app) or check out the [Evaluation Quick Start Guide](/evaluation/quickstart) to get started with the SDK.
2 changes: 1 addition & 1 deletion docs/evaluation/faq/synthetic-data.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Synthetic Data for Evaluation
sidebar_position: 8
sidebar_position: 9
---

# Synthetic Data for Evaluation
Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation/faq/unit-testing.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Unit Test
sidebar_position: 3
sidebar_position: 4
---

# Unit Tests
Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation/faq/version-datasets.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: Version Datasets
sidebar_position: 5
sidebar_position: 6
---

# How to version datasets
Expand Down
Binary file added docs/evaluation/static/filter_to_regressions.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/evaluation/static/regression_view.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/evaluation/static/select_baseline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/evaluation/static/select_feedback.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/monitoring/static/class-optimization-neg.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/monitoring/static/class-optimization-pos.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 66af49b

Please sign in to comment.