Merge branch 'main' into harrison/tracing-tutorial

langchain-ai · May 3, 2024 · 66af49b · 66af49b
2 parents 78667e1 + b0c5a95
commit 66af49b
Show file tree

Hide file tree

Showing 100 changed files with 3,148 additions and 160 deletions.
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -0,0 +1,4 @@
+{
+  "editor.trimAutoWhitespace": false,
+  "files.trimTrailingWhitespaceInRegexAndStrings": false
+}
diff --git a/docs/evaluation/faq/custom-evaluators.mdx b/docs/evaluation/faq/custom-evaluators.mdx
@@ -126,7 +126,7 @@ With function calling, it has become easier than ever to generate feedback metri
 Below is an example (in this case using OpenAI's tool calling functionality) to evaluate RAG app faithfulness.
 
 ````python
-iimport json
+import json
 from typing import List
 
 import openai

diff --git a/docs/evaluation/faq/experiments-app.mdx b/docs/evaluation/faq/experiments-app.mdx
@@ -1,6 +1,6 @@
 ---
 sidebar_label: Run Experiments in Browser (no code)
-sidebar_position: 7
+sidebar_position: 8
 ---
 
 # How to run experiments in the prompt playground (no code)

diff --git a/docs/evaluation/faq/manage-datasets.mdx b/docs/evaluation/faq/manage-datasets.mdx
@@ -1,6 +1,6 @@
 ---
 sidebar_label: Manage Datasets
-sidebar_position: 4
+sidebar_position: 5
 ---
 
 import {

diff --git a/docs/evaluation/faq/regression-testing.mdx b/docs/evaluation/faq/regression-testing.mdx
@@ -0,0 +1,40 @@
+---
+sidebar_label: Regression Testing
+sidebar_position: 3
+---
+
+# Regression Testing
+
+When evaluating LLM applications, it is important to be able to track how your system performs over time. In this guide, we will show you how to use LangSmith's comparison view in
+order to track regressions in your application, and drill down to inspect the specific runs that improved/regressed over time.
+
+## Overview
+
+In the LangSmith comparison view, runs that _regressed_ on your specified feedback key against your baseline experiment will be highlighted in red, while runs that _improved_
+will be highlighted in green. At the top of each column, you can see how many runs in that experiment did better and and how many did worse than your baseline experiment.
+
+![Regressions](../static/regression_view.png)
+
+## Baseline Experiment
+
+In order to track regressions, you need a baseline experiment against which to compare. This will be automatically assigned as the first experiment in your comparison, but you can
+change it from the dropdown at the top of the page.
+
+![Baseline](../static/select_baseline.png)
+
+## Select Feedback Key
+
+You will also want to select the feedback key on which you would like focus. This can be selected via another dropdown at the top. Again, one will be assigned by
+default, but you can adjust as needed.
+
+![Feedback](../static/select_feedback.png)
+
+## Filter to Regressions or Improvements
+
+Click on the regressions or improvements buttons on the top of each column to filter to the runs that regressed or improved in that specific experiment.
+
+![Regressions Filter](../static/filter_to_regressions.png)
+
+## Try it out
+
+To get started with regression testing, try [running a no-code experiment in our prompt playground](experiments-app) or check out the [Evaluation Quick Start Guide](/evaluation/quickstart) to get started with the SDK.
diff --git a/docs/evaluation/faq/synthetic-data.mdx b/docs/evaluation/faq/synthetic-data.mdx
@@ -1,6 +1,6 @@
 ---
 sidebar_label: Synthetic Data for Evaluation
-sidebar_position: 8
+sidebar_position: 9
 ---
 
 # Synthetic Data for Evaluation

diff --git a/docs/evaluation/faq/unit-testing.mdx b/docs/evaluation/faq/unit-testing.mdx
@@ -1,6 +1,6 @@
 ---
 sidebar_label: Unit Test
-sidebar_position: 3
+sidebar_position: 4
 ---
 
 # Unit Tests

diff --git a/docs/evaluation/faq/version-datasets.mdx b/docs/evaluation/faq/version-datasets.mdx
@@ -1,6 +1,6 @@
 ---
 sidebar_label: Version Datasets
-sidebar_position: 5
+sidebar_position: 6
 ---
 
 # How to version datasets

diff --git a/docs/evaluation/static/filter_to_regressions.png b/docs/evaluation/static/filter_to_regressions.png
diff --git a/docs/evaluation/static/regression_view.png b/docs/evaluation/static/regression_view.png
diff --git a/docs/evaluation/static/select_baseline.png b/docs/evaluation/static/select_baseline.png
diff --git a/docs/evaluation/static/select_feedback.png b/docs/evaluation/static/select_feedback.png
diff --git a/docs/monitoring/static/class-optimization-neg.png b/docs/monitoring/static/class-optimization-neg.png
diff --git a/docs/monitoring/static/class-optimization-pos.png b/docs/monitoring/static/class-optimization-pos.png