langchain-ai · isahers1 · Jan 27, 2025 · Jan 27, 2025 · Jan 30, 2025 · Jan 30, 2025
diff --git a/...nistration/how_to_guides/organization_management/manage_organization_by_api.mdx b/...nistration/how_to_guides/organization_management/manage_organization_by_api.mdx
@@ -144,7 +144,7 @@ If the header is not present, operations will default to the workspace the API k
 ## Security Settings
 
 :::note
-"Shared resources" in this context refer to [public prompts](../../../prompt_engineering/how_to_guides/prompts/create_a_prompt#save-your-prompt), [shared runs](../../../observability/how_to_guides/tracing/share_trace), and [shared datasets](../../../evaluation/how_to_guides/share_dataset.mdx).
+"Shared resources" in this context refer to [public prompts](../../../prompt_engineering/how_to_guides/create_a_prompt#save-your-prompt), [shared runs](../../../observability/how_to_guides/share_trace), and [shared datasets](../../../evaluation/how_to_guides/share_dataset.mdx).
 :::
 
 - <RegionalUrl

diff --git a/docs/administration/tutorials/manage_spend.mdx b/docs/administration/tutorials/manage_spend.mdx
@@ -129,7 +129,7 @@ of LangSmith's built in ability to do server side sampling for extended data ret
 Choosing the right percentage of runs to sample depends on your use case. We will arbitrarily pick 10% of runs here, but will
 leave it to the user to find the right value that balances collecting rare events and cost constraints.
 
-LangSmith automatically upgrades the data retention for any trace that matches a run rule in our automations product (see our [run rules docs](../../observability/how_to_guides/monitoring/rules)). On the
+LangSmith automatically upgrades the data retention for any trace that matches a run rule in our automations product (see our [run rules docs](../../observability/how_to_guides/rules)). On the
 projects page, click `Rules -> Add Rule`, and configure the rule as follows:
 
 ![](./static/P2SampleTraces.png)
@@ -140,7 +140,7 @@ be thought of as a tree of runs making up an API call. When a run rule matches a
 upgrades to be retained for 400 days.
 
 Therefore, to make sure we have the proper sampling rate on traces, we take advantage of the
-[filtering](../../observability/how_to_guides/monitoring/rules#step-2-define-the-filter) functionality of run rules.
+[filtering](../../observability/how_to_guides/rules#step-2-define-the-filter) functionality of run rules.
 
 We add add a filter condition to only match the "root" run in the run tree. This is distinct per trace, so our 10% sampling
 will upgrade 10% of traces, rather 10% of runs, which could correspond to more than 10% of traces. If desired, we can optionally add

diff --git a/docs/evaluation/concepts/index.mdx b/docs/evaluation/concepts/index.mdx
@@ -98,7 +98,7 @@ There are a number of ways to define and run evaluators:
 - **Custom code**: Define [custom evaluators](/evaluation/how_to_guides/custom_evaluator) as Python or TypeScript functions and run them client-side using the SDKs or server-side via the UI.
 - **Built-in evaluators**: LangSmith has a number of built-in evaluators that you can configure and run via the UI.
 
-You can run evaluators using the LangSmith SDK ([Python](https://docs.smith.langchain.com/reference/python) and [TypeScript](https://docs.smith.langchain.com/reference/js)), via the [Prompt Playground](../../prompt_engineering/concepts#prompt-playground), or by configuring [Rules](../../observability/how_to_guides/monitoring/rules) to automatically run them on particular tracing projects or datasets.
+You can run evaluators using the LangSmith SDK ([Python](https://docs.smith.langchain.com/reference/python) and [TypeScript](https://docs.smith.langchain.com/reference/js)), via the [Prompt Playground](../../prompt_engineering/concepts#prompt-playground), or by configuring [Rules](../../observability/how_to_guides/rules) to automatically run them on particular tracing projects or datasets.
 
 #### Evaluation techniques
 
@@ -162,7 +162,7 @@ It is offline because we're evaluating on a pre-compiled set of data.
 An online evaluation, on the other hand, is one in which we evaluate a deployed application's outputs on real traffic, in near realtime.
 Offline evaluations are used for testing a version(s) of your application pre-deployment.
 
-You can run offline evaluations client-side using the LangSmith SDK ([Python](https://docs.smith.langchain.com/reference/python) and [TypeScript](https://docs.smith.langchain.com/reference/js)). You can run them server-side via the [Prompt Playground](../../prompt_engineering/concepts#prompt-playground) or by configuring [automations](/observability/how_to_guides/monitoring/rules) to run certain evaluators on every new experiment against a specific dataset.
+You can run offline evaluations client-side using the LangSmith SDK ([Python](https://docs.smith.langchain.com/reference/python) and [TypeScript](https://docs.smith.langchain.com/reference/js)). You can run them server-side via the [Prompt Playground](../../prompt_engineering/concepts#prompt-playground) or by configuring [automations](/observability/how_to_guides/rules) to run certain evaluators on every new experiment against a specific dataset.
 
 ![Offline](./static/offline.png)
 

diff --git a/docs/evaluation/how_to_guides/annotation_queues.mdx b/docs/evaluation/how_to_guides/annotation_queues.mdx
@@ -71,7 +71,7 @@ To assign runs to an annotation queue, either:
 2. Select multiple runs in the runs table then click **Add to Annotation Queue** at the bottom of the page.
    ![](./static/multi_select_annotation_queue.png)
 
-3. [Set up an automation rule](../../../observability/how_to_guides/monitoring/rules) that automatically assigns runs which pass a certain filter and sampling condition to an annotation queue.
+3. [Set up an automation rule](../../../observability/how_to_guides/rules) that automatically assigns runs which pass a certain filter and sampling condition to an annotation queue.
 
 4. Select one or multiple experiments from the dataset page and click **Annotate**. From the resulting popup, you may either create a new queue or add the runs to an existing one:
    ![](./static/annotate_experiment.png)

diff --git a/docs/evaluation/how_to_guides/attach_user_feedback.mdx b/docs/evaluation/how_to_guides/attach_user_feedback.mdx
@@ -20,10 +20,10 @@ Before diving into this content, it might be helpful to read the following:
 
 In many applications, but even more so for LLM applications, it is important to collect user feedback to understand how your application is performing in real-world scenarios.
 The ability to observe user feedback along with trace data can be very powerful to drill down into the most interesting datapoints, then send those datapoints for further review, automatic evaluation, or even datasets.
-To learn more about how to filter traces based on various attributes, including user feedback, see [this guide](../../../observability/how_to_guides/monitoring/filter_traces_in_application)
+To learn more about how to filter traces based on various attributes, including user feedback, see [this guide](../../../observability/how_to_guides/filter_traces_in_application)
 
 LangSmith makes it easy to attach user feedback to traces.
-It's often helpful to expose a simple mechanism (such as a thumbs-up, thumbs-down button) to collect user feedback for your application responses. You can then use the LangSmith SDK or API to send feedback for a trace. To get the `run_id` of a logged run, see [this guide](../../../observability/how_to_guides/tracing/access_current_span).
+It's often helpful to expose a simple mechanism (such as a thumbs-up, thumbs-down button) to collect user feedback for your application responses. You can then use the LangSmith SDK or API to send feedback for a trace. To get the `run_id` of a logged run, see [this guide](../../../observability/how_to_guides/access_current_span).
 
 :::note
 

diff --git a/docs/evaluation/how_to_guides/bind_evaluator_to_dataset.mdx b/docs/evaluation/how_to_guides/bind_evaluator_to_dataset.mdx
@@ -7,7 +7,7 @@ sidebar_position: 2
 While you can specify evaluators to grade the results of your experiments programmatically (see [this guide](./evaluate_llm_application) for more information), you can also bind evaluators to a dataset in the UI.
 This allows you to configure automatic evaluators that grade your experiment results. We have support for both LLM-based evaluators, and custom python code evaluators.
 
-The process for configuring this is very similar to the process for configuring an [online evaluator](../../../observability/how_to_guides/monitoring/online_evaluations) for traces.
+The process for configuring this is very similar to the process for configuring an [online evaluator](../../../observability/how_to_guides/online_evaluations) for traces.
 
 :::note Only affects subsequent experiment runs
 When you configure an evaluator for a dataset, it will only affect the experiment runs that are created after the evaluator is configured. It will not affect the evaluation of experiment runs that were created before the evaluator was configured.

diff --git a/docs/evaluation/how_to_guides/create_few_shot_evaluators.mdx b/docs/evaluation/how_to_guides/create_few_shot_evaluators.mdx
@@ -11,7 +11,7 @@ you to automatically collect human corrections on evaluator prompts, which are t
 :::tip Recommended Reading
 Before learning how to create few-shot evaluators, it might be helpful to learn how to setup automations (both online and offline) and how to leave corrections on evaluator scores:
 
-- [Set up online evaluations](../../../observability/how_to_guides/monitoring/online_evaluations)
+- [Set up online evaluations](../../../observability/how_to_guides/online_evaluations)
 - [Bind an evaluator to a dataset in the UI (offline evaluation)](./bind_evaluator_to_dataset)
 - [Audit evaluator scores](./audit_evaluator_scores)
 
@@ -24,7 +24,7 @@ The default maximum few-shot examples to use in the prompt is 5. Examples are pu
 
 :::
 
-When creating an [online](../../../observability/how_to_guides/monitoring/online_evaluations) or [offline](./bind_evaluator_to_dataset) evaluator - from a tracing project or a dataset, respectively - you will see the option to use corrections as few-shot examples. Note that these types of evaluators
+When creating an [online](../../../observability/how_to_guides/online_evaluations) or [offline](./bind_evaluator_to_dataset) evaluator - from a tracing project or a dataset, respectively - you will see the option to use corrections as few-shot examples. Note that these types of evaluators
 are only supported when using mustache prompts - you will not be able to click this option if your prompt uses f-string formatting. When you select this,
 we will auto-create a few-shot prompt for you. Each individual few-shot example will be formatted according to this prompt, and inserted into your main prompt in place of the `{{Few-shot examples}}`
 template variable which will be auto-added above. Your few-shot prompt should contain the same variables as your main prompt, plus a `few_shot_explanation` and a score variable which should have the same name

diff --git a/docs/evaluation/how_to_guides/evaluate_llm_application.mdx b/docs/evaluation/how_to_guides/evaluate_llm_application.mdx
@@ -92,7 +92,7 @@ First we need an application to evaluate. Let's create a simple toxicity classif
 />
 
 We've optionally enabled tracing to capture the inputs and outputs of each step in the pipeline.
-To understand how to annotate your code for tracing, please refer to [this guide](../../../observability/how_to_guides/tracing/annotate_code).
+To understand how to annotate your code for tracing, please refer to [this guide](../../../observability/how_to_guides/annotate_code).
 
 ## Create or select a dataset
 

diff --git a/docs/evaluation/how_to_guides/evaluate_pairwise.mdx b/docs/evaluation/how_to_guides/evaluate_pairwise.mdx
@@ -88,7 +88,7 @@ which asks the LLM to decide which is better between two AI assistant responses.
 
 :::info Optional LangChain Usage
 
-In the Python example below, we are pulling [this structured prompt](https://smith.langchain.com/hub/langchain-ai/pairwise-evaluation-2) from the [LangChain Hub](../../../prompt_engineering/how_to_guides/prompts/langchain_hub) and using it with a LangChain chat model wrapper.
+In the Python example below, we are pulling [this structured prompt](https://smith.langchain.com/hub/langchain-ai/pairwise-evaluation-2) from the [LangChain Hub](../../../prompt_engineering/how_to_guides/langchain_hub) and using it with a LangChain chat model wrapper.
 
 **Usage of LangChain is totally optional.** To illustrate this point, the TypeScript example uses the OpenAI SDK directly.
 

diff --git a/docs/evaluation/how_to_guides/evaluate_with_attachments.mdx b/docs/evaluation/how_to_guides/evaluate_with_attachments.mdx
@@ -7,7 +7,7 @@ import {
 
 # Run an evaluation with large file inputs
 
-In addition to supporting [file attachments with traces](../../../observability/how_to_guides/tracing/upload_files_with_traces), LangSmith supports arbitrary file attachments with your examples, which you can consume when you run experiments.
+In addition to supporting [file attachments with traces](../../../observability/how_to_guides/upload_files_with_traces), LangSmith supports arbitrary file attachments with your examples, which you can consume when you run experiments.
 
 This is particularly useful when working with LLM applications that require multimodal inputs or outputs.
 

diff --git a/docs/evaluation/how_to_guides/index.md b/docs/evaluation/how_to_guides/index.md
@@ -42,7 +42,7 @@ Evaluate and improve your application before deploying it.
 
 - [Evaluate with repetitions](./how_to_guides/repetition)
 - [Handle model rate limits](./how_to_guides/rate_limiting)
-- [Print detailed logs (Python only)](../../observability/how_to_guides/tracing/output_detailed_logs)
+- [Print detailed logs (Python only)](../../observability/how_to_guides/output_detailed_logs)
 - [Run an evaluation locally (beta, Python only)](./how_to_guides/local)
 
 ## Testing integrations
@@ -56,8 +56,8 @@ Run evals using your favorite testing tools:
 
 Evaluate and monitor your system's live performance on production data.
 
-- [Set up an LLM-as-judge online evaluator](../../observability/how_to_guides/monitoring/online_evaluations#configure-llm-as-judge-evaluators)
-- [Set up a custom code online evaluator](../../observability/how_to_guides/monitoring/online_evaluations#configure-custom-code-evaluators)
+- [Set up an LLM-as-judge online evaluator](../../observability/how_to_guides/online_evaluations#configure-llm-as-judge-evaluators)
+- [Set up a custom code online evaluator](../../observability/how_to_guides/online_evaluations#configure-custom-code-evaluators)
 - [Create a few-shot evaluator](./how_to_guides/create_few_shot_evaluators)
 
 ## Automatic evaluation

diff --git a/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx b/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx
@@ -41,7 +41,7 @@ For the full list of available transformations, see [our reference](/reference/e
 :::note
 If you plan to collect production traces in your dataset from LangChain
 [ChatModels](https://python.langchain.com/docs/concepts/chat_models/)
-or from OpenAI calls using the [LangSmith OpenAI wrapper](/observability/how_to_guides/tracing/annotate_code#wrap-the-openai-client), we offer a prebuilt Chat Model schema that converts messages and tools into industry standard openai formats that can be used downstream with any model for testing. You can also customize the template settings to match your use case.
+or from OpenAI calls using the [LangSmith OpenAI wrapper](/observability/how_to_guides/annotate_code#wrap-the-openai-client), we offer a prebuilt Chat Model schema that converts messages and tools into industry standard openai formats that can be used downstream with any model for testing. You can also customize the template settings to match your use case.
 
 Please see the [dataset transformations reference](/reference/evaluation/dataset_transformations) for more information.
 :::
@@ -57,7 +57,7 @@ through our tracing projects to find the runs we want to add to the dataset. The
 
 :::tip
 An extremely powerful technique to build datasets is to drill-down into the most interesting traces, such as traces that were tagged with poor user feedback, and add them to a dataset.
-For tips on how to filter traces, see the [filtering traces](../../../observability/how_to_guides/monitoring/filter_traces_in_application) guide.
+For tips on how to filter traces, see the [filtering traces](../../../observability/how_to_guides/filter_traces_in_application) guide.
 :::
 
 There are two ways to add data from tracing projects to datasets.
@@ -78,7 +78,7 @@ the run before adding it to the dataset.
 
 ### Automatically add runs to a dataset
 
-You can use [run rules](../../../observability/how_to_guides/monitoring/rules) to automatically add traces to a dataset based on certain conditions. For example, you could add all traces that have a certain tag to a dataset.
+You can use [run rules](../../../observability/how_to_guides/rules) to automatically add traces to a dataset based on certain conditions. For example, you could add all traces that have a certain tag to a dataset.
 
 ### Add runs from an annotation queue
 

diff --git a/docs/evaluation/how_to_guides/manage_datasets_programmatically.mdx b/docs/evaluation/how_to_guides/manage_datasets_programmatically.mdx
@@ -86,7 +86,7 @@ await client.createExamples({
 ### Create a dataset from traces
 
 To create datasets from the runs (spans) of your traces, you can use the same approach.
-For **many** more examples of how to fetch and filter runs, see the [export traces](../../../observability/how_to_guides/tracing/export_traces) guide.
+For **many** more examples of how to fetch and filter runs, see the [export traces](../../../observability/how_to_guides/export_traces) guide.
 Below is an example:
 
 <CodeTabs
@@ -367,7 +367,7 @@ For example, if you have an example with metadata `{"foo": "bar", "baz": "qux"}`
 
 ### List examples by structured filter
 
-Similar to how you can use the structured filter query language to [fetch runs](../../../observability/how_to_guides/tracing/export_traces#use-filter-query-language), you can use it to fetch examples.
+Similar to how you can use the structured filter query language to [fetch runs](../../../observability/how_to_guides/export_traces#use-filter-query-language), you can use it to fetch examples.
 
 :::note
 

diff --git a/docs/evaluation/tutorials/evaluation.mdx b/docs/evaluation/tutorials/evaluation.mdx
@@ -320,7 +320,7 @@ There are many types of evaluators you may wish to explore.
 For information on this, check out the [how-to guides](../../evaluation/how_to_guides).
 
 Additionally, there are other ways to evaluate data besides in this "offline" manner (e.g. you can evaluate production data).
-For more information on online evaluation, check out [this guide](../../observability/how_to_guides/monitoring/online_evaluations).
+For more information on online evaluation, check out [this guide](../../observability/how_to_guides/online_evaluations).
 
 ## Reference code
 

diff --git a/docs/index.mdx b/docs/index.mdx
@@ -33,7 +33,7 @@ It allows you to closely monitor and evaluate your application, so you can ship
 
 LangSmith integrates seamlessly with LangChain's open source frameworks [`langchain`](https://python.langchain.com) and [`langgraph`](https://langchain-ai.github.io/langgraph/), with no extra instrumentation needed.
 
-If you're already using either of these, see the how-to guide for [setting up LangSmith with LangChain](./observability/how_to_guides/tracing/trace_with_langchain) or [setting up LangSmith with LangGraph](https://docs.smith.langchain.com/observability/how_to_guides/tracing/trace_with_langgraph).
+If you're already using either of these, see the how-to guide for [setting up LangSmith with LangChain](./observability/how_to_guides/trace_with_langchain) or [setting up LangSmith with LangGraph](https://docs.smith.langchain.com/observability/how_to_guides/tracing/trace_with_langgraph).
 :::
 
 ## Observability
@@ -43,7 +43,7 @@ Observability is important for any software application, but especially so for L
 This is where LangSmith can help! LangSmith has LLM-native observability, allowing you to get meaningful insights from your application. LangSmith’s observability features have you covered throughout all stages of application development - from prototyping, to beta testing, to production.
 
 - Get started by [adding tracing](./observability) to your application.
-- [Create dashboards](./observability/how_to_guides/monitoring/dashboards) to view key metrics like RPS, error rates and costs.
+- [Create dashboards](./observability/how_to_guides/dashboards) to view key metrics like RPS, error rates and costs.
 
 ## Evals