Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove unnecessary url nesting #650

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ If the header is not present, operations will default to the workspace the API k
## Security Settings

:::note
"Shared resources" in this context refer to [public prompts](../../../prompt_engineering/how_to_guides/prompts/create_a_prompt#save-your-prompt), [shared runs](../../../observability/how_to_guides/tracing/share_trace), and [shared datasets](../../../evaluation/how_to_guides/share_dataset.mdx).
"Shared resources" in this context refer to [public prompts](../../../prompt_engineering/how_to_guides/create_a_prompt#save-your-prompt), [shared runs](../../../observability/how_to_guides/share_trace), and [shared datasets](../../../evaluation/how_to_guides/share_dataset.mdx).
:::

- <RegionalUrl
Expand Down
4 changes: 2 additions & 2 deletions docs/administration/tutorials/manage_spend.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ of LangSmith's built in ability to do server side sampling for extended data ret
Choosing the right percentage of runs to sample depends on your use case. We will arbitrarily pick 10% of runs here, but will
leave it to the user to find the right value that balances collecting rare events and cost constraints.

LangSmith automatically upgrades the data retention for any trace that matches a run rule in our automations product (see our [run rules docs](../../observability/how_to_guides/monitoring/rules)). On the
LangSmith automatically upgrades the data retention for any trace that matches a run rule in our automations product (see our [run rules docs](../../observability/how_to_guides/rules)). On the
projects page, click `Rules -> Add Rule`, and configure the rule as follows:

![](./static/P2SampleTraces.png)
Expand All @@ -140,7 +140,7 @@ be thought of as a tree of runs making up an API call. When a run rule matches a
upgrades to be retained for 400 days.

Therefore, to make sure we have the proper sampling rate on traces, we take advantage of the
[filtering](../../observability/how_to_guides/monitoring/rules#step-2-define-the-filter) functionality of run rules.
[filtering](../../observability/how_to_guides/rules#step-2-define-the-filter) functionality of run rules.

We add add a filter condition to only match the "root" run in the run tree. This is distinct per trace, so our 10% sampling
will upgrade 10% of traces, rather 10% of runs, which could correspond to more than 10% of traces. If desired, we can optionally add
Expand Down
4 changes: 2 additions & 2 deletions docs/evaluation/concepts/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ There are a number of ways to define and run evaluators:
- **Custom code**: Define [custom evaluators](/evaluation/how_to_guides/custom_evaluator) as Python or TypeScript functions and run them client-side using the SDKs or server-side via the UI.
- **Built-in evaluators**: LangSmith has a number of built-in evaluators that you can configure and run via the UI.

You can run evaluators using the LangSmith SDK ([Python](https://docs.smith.langchain.com/reference/python) and [TypeScript](https://docs.smith.langchain.com/reference/js)), via the [Prompt Playground](../../prompt_engineering/concepts#prompt-playground), or by configuring [Rules](../../observability/how_to_guides/monitoring/rules) to automatically run them on particular tracing projects or datasets.
You can run evaluators using the LangSmith SDK ([Python](https://docs.smith.langchain.com/reference/python) and [TypeScript](https://docs.smith.langchain.com/reference/js)), via the [Prompt Playground](../../prompt_engineering/concepts#prompt-playground), or by configuring [Rules](../../observability/how_to_guides/rules) to automatically run them on particular tracing projects or datasets.

#### Evaluation techniques

Expand Down Expand Up @@ -162,7 +162,7 @@ It is offline because we're evaluating on a pre-compiled set of data.
An online evaluation, on the other hand, is one in which we evaluate a deployed application's outputs on real traffic, in near realtime.
Offline evaluations are used for testing a version(s) of your application pre-deployment.

You can run offline evaluations client-side using the LangSmith SDK ([Python](https://docs.smith.langchain.com/reference/python) and [TypeScript](https://docs.smith.langchain.com/reference/js)). You can run them server-side via the [Prompt Playground](../../prompt_engineering/concepts#prompt-playground) or by configuring [automations](/observability/how_to_guides/monitoring/rules) to run certain evaluators on every new experiment against a specific dataset.
You can run offline evaluations client-side using the LangSmith SDK ([Python](https://docs.smith.langchain.com/reference/python) and [TypeScript](https://docs.smith.langchain.com/reference/js)). You can run them server-side via the [Prompt Playground](../../prompt_engineering/concepts#prompt-playground) or by configuring [automations](/observability/how_to_guides/rules) to run certain evaluators on every new experiment against a specific dataset.

![Offline](./static/offline.png)

Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation/how_to_guides/annotation_queues.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ To assign runs to an annotation queue, either:
2. Select multiple runs in the runs table then click **Add to Annotation Queue** at the bottom of the page.
![](./static/multi_select_annotation_queue.png)

3. [Set up an automation rule](../../../observability/how_to_guides/monitoring/rules) that automatically assigns runs which pass a certain filter and sampling condition to an annotation queue.
3. [Set up an automation rule](../../../observability/how_to_guides/rules) that automatically assigns runs which pass a certain filter and sampling condition to an annotation queue.

4. Select one or multiple experiments from the dataset page and click **Annotate**. From the resulting popup, you may either create a new queue or add the runs to an existing one:
![](./static/annotate_experiment.png)
Expand Down
4 changes: 2 additions & 2 deletions docs/evaluation/how_to_guides/attach_user_feedback.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ Before diving into this content, it might be helpful to read the following:

In many applications, but even more so for LLM applications, it is important to collect user feedback to understand how your application is performing in real-world scenarios.
The ability to observe user feedback along with trace data can be very powerful to drill down into the most interesting datapoints, then send those datapoints for further review, automatic evaluation, or even datasets.
To learn more about how to filter traces based on various attributes, including user feedback, see [this guide](../../../observability/how_to_guides/monitoring/filter_traces_in_application)
To learn more about how to filter traces based on various attributes, including user feedback, see [this guide](../../../observability/how_to_guides/filter_traces_in_application)

LangSmith makes it easy to attach user feedback to traces.
It's often helpful to expose a simple mechanism (such as a thumbs-up, thumbs-down button) to collect user feedback for your application responses. You can then use the LangSmith SDK or API to send feedback for a trace. To get the `run_id` of a logged run, see [this guide](../../../observability/how_to_guides/tracing/access_current_span).
It's often helpful to expose a simple mechanism (such as a thumbs-up, thumbs-down button) to collect user feedback for your application responses. You can then use the LangSmith SDK or API to send feedback for a trace. To get the `run_id` of a logged run, see [this guide](../../../observability/how_to_guides/access_current_span).

:::note

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ sidebar_position: 2
While you can specify evaluators to grade the results of your experiments programmatically (see [this guide](./evaluate_llm_application) for more information), you can also bind evaluators to a dataset in the UI.
This allows you to configure automatic evaluators that grade your experiment results. We have support for both LLM-based evaluators, and custom python code evaluators.

The process for configuring this is very similar to the process for configuring an [online evaluator](../../../observability/how_to_guides/monitoring/online_evaluations) for traces.
The process for configuring this is very similar to the process for configuring an [online evaluator](../../../observability/how_to_guides/online_evaluations) for traces.

:::note Only affects subsequent experiment runs
When you configure an evaluator for a dataset, it will only affect the experiment runs that are created after the evaluator is configured. It will not affect the evaluation of experiment runs that were created before the evaluator was configured.
Expand Down
4 changes: 2 additions & 2 deletions docs/evaluation/how_to_guides/create_few_shot_evaluators.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ you to automatically collect human corrections on evaluator prompts, which are t
:::tip Recommended Reading
Before learning how to create few-shot evaluators, it might be helpful to learn how to setup automations (both online and offline) and how to leave corrections on evaluator scores:

- [Set up online evaluations](../../../observability/how_to_guides/monitoring/online_evaluations)
- [Set up online evaluations](../../../observability/how_to_guides/online_evaluations)
- [Bind an evaluator to a dataset in the UI (offline evaluation)](./bind_evaluator_to_dataset)
- [Audit evaluator scores](./audit_evaluator_scores)

Expand All @@ -24,7 +24,7 @@ The default maximum few-shot examples to use in the prompt is 5. Examples are pu

:::

When creating an [online](../../../observability/how_to_guides/monitoring/online_evaluations) or [offline](./bind_evaluator_to_dataset) evaluator - from a tracing project or a dataset, respectively - you will see the option to use corrections as few-shot examples. Note that these types of evaluators
When creating an [online](../../../observability/how_to_guides/online_evaluations) or [offline](./bind_evaluator_to_dataset) evaluator - from a tracing project or a dataset, respectively - you will see the option to use corrections as few-shot examples. Note that these types of evaluators
are only supported when using mustache prompts - you will not be able to click this option if your prompt uses f-string formatting. When you select this,
we will auto-create a few-shot prompt for you. Each individual few-shot example will be formatted according to this prompt, and inserted into your main prompt in place of the `{{Few-shot examples}}`
template variable which will be auto-added above. Your few-shot prompt should contain the same variables as your main prompt, plus a `few_shot_explanation` and a score variable which should have the same name
Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation/how_to_guides/evaluate_llm_application.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ First we need an application to evaluate. Let's create a simple toxicity classif
/>

We've optionally enabled tracing to capture the inputs and outputs of each step in the pipeline.
To understand how to annotate your code for tracing, please refer to [this guide](../../../observability/how_to_guides/tracing/annotate_code).
To understand how to annotate your code for tracing, please refer to [this guide](../../../observability/how_to_guides/annotate_code).

## Create or select a dataset

Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation/how_to_guides/evaluate_pairwise.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ which asks the LLM to decide which is better between two AI assistant responses.

:::info Optional LangChain Usage

In the Python example below, we are pulling [this structured prompt](https://smith.langchain.com/hub/langchain-ai/pairwise-evaluation-2) from the [LangChain Hub](../../../prompt_engineering/how_to_guides/prompts/langchain_hub) and using it with a LangChain chat model wrapper.
In the Python example below, we are pulling [this structured prompt](https://smith.langchain.com/hub/langchain-ai/pairwise-evaluation-2) from the [LangChain Hub](../../../prompt_engineering/how_to_guides/langchain_hub) and using it with a LangChain chat model wrapper.

**Usage of LangChain is totally optional.** To illustrate this point, the TypeScript example uses the OpenAI SDK directly.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import {

# Run an evaluation with large file inputs

In addition to supporting [file attachments with traces](../../../observability/how_to_guides/tracing/upload_files_with_traces), LangSmith supports arbitrary file attachments with your examples, which you can consume when you run experiments.
In addition to supporting [file attachments with traces](../../../observability/how_to_guides/upload_files_with_traces), LangSmith supports arbitrary file attachments with your examples, which you can consume when you run experiments.

This is particularly useful when working with LLM applications that require multimodal inputs or outputs.

Expand Down
6 changes: 3 additions & 3 deletions docs/evaluation/how_to_guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Evaluate and improve your application before deploying it.

- [Evaluate with repetitions](./how_to_guides/repetition)
- [Handle model rate limits](./how_to_guides/rate_limiting)
- [Print detailed logs (Python only)](../../observability/how_to_guides/tracing/output_detailed_logs)
- [Print detailed logs (Python only)](../../observability/how_to_guides/output_detailed_logs)
- [Run an evaluation locally (beta, Python only)](./how_to_guides/local)

## Testing integrations
Expand All @@ -56,8 +56,8 @@ Run evals using your favorite testing tools:

Evaluate and monitor your system's live performance on production data.

- [Set up an LLM-as-judge online evaluator](../../observability/how_to_guides/monitoring/online_evaluations#configure-llm-as-judge-evaluators)
- [Set up a custom code online evaluator](../../observability/how_to_guides/monitoring/online_evaluations#configure-custom-code-evaluators)
- [Set up an LLM-as-judge online evaluator](../../observability/how_to_guides/online_evaluations#configure-llm-as-judge-evaluators)
- [Set up a custom code online evaluator](../../observability/how_to_guides/online_evaluations#configure-custom-code-evaluators)
- [Create a few-shot evaluator](./how_to_guides/create_few_shot_evaluators)

## Automatic evaluation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ For the full list of available transformations, see [our reference](/reference/e
:::note
If you plan to collect production traces in your dataset from LangChain
[ChatModels](https://python.langchain.com/docs/concepts/chat_models/)
or from OpenAI calls using the [LangSmith OpenAI wrapper](/observability/how_to_guides/tracing/annotate_code#wrap-the-openai-client), we offer a prebuilt Chat Model schema that converts messages and tools into industry standard openai formats that can be used downstream with any model for testing. You can also customize the template settings to match your use case.
or from OpenAI calls using the [LangSmith OpenAI wrapper](/observability/how_to_guides/annotate_code#wrap-the-openai-client), we offer a prebuilt Chat Model schema that converts messages and tools into industry standard openai formats that can be used downstream with any model for testing. You can also customize the template settings to match your use case.

Please see the [dataset transformations reference](/reference/evaluation/dataset_transformations) for more information.
:::
Expand All @@ -57,7 +57,7 @@ through our tracing projects to find the runs we want to add to the dataset. The

:::tip
An extremely powerful technique to build datasets is to drill-down into the most interesting traces, such as traces that were tagged with poor user feedback, and add them to a dataset.
For tips on how to filter traces, see the [filtering traces](../../../observability/how_to_guides/monitoring/filter_traces_in_application) guide.
For tips on how to filter traces, see the [filtering traces](../../../observability/how_to_guides/filter_traces_in_application) guide.
:::

There are two ways to add data from tracing projects to datasets.
Expand All @@ -78,7 +78,7 @@ the run before adding it to the dataset.

### Automatically add runs to a dataset

You can use [run rules](../../../observability/how_to_guides/monitoring/rules) to automatically add traces to a dataset based on certain conditions. For example, you could add all traces that have a certain tag to a dataset.
You can use [run rules](../../../observability/how_to_guides/rules) to automatically add traces to a dataset based on certain conditions. For example, you could add all traces that have a certain tag to a dataset.

### Add runs from an annotation queue

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ await client.createExamples({
### Create a dataset from traces

To create datasets from the runs (spans) of your traces, you can use the same approach.
For **many** more examples of how to fetch and filter runs, see the [export traces](../../../observability/how_to_guides/tracing/export_traces) guide.
For **many** more examples of how to fetch and filter runs, see the [export traces](../../../observability/how_to_guides/export_traces) guide.
Below is an example:

<CodeTabs
Expand Down Expand Up @@ -367,7 +367,7 @@ For example, if you have an example with metadata `{"foo": "bar", "baz": "qux"}`

### List examples by structured filter

Similar to how you can use the structured filter query language to [fetch runs](../../../observability/how_to_guides/tracing/export_traces#use-filter-query-language), you can use it to fetch examples.
Similar to how you can use the structured filter query language to [fetch runs](../../../observability/how_to_guides/export_traces#use-filter-query-language), you can use it to fetch examples.

:::note

Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation/tutorials/evaluation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ There are many types of evaluators you may wish to explore.
For information on this, check out the [how-to guides](../../evaluation/how_to_guides).

Additionally, there are other ways to evaluate data besides in this "offline" manner (e.g. you can evaluate production data).
For more information on online evaluation, check out [this guide](../../observability/how_to_guides/monitoring/online_evaluations).
For more information on online evaluation, check out [this guide](../../observability/how_to_guides/online_evaluations).

## Reference code

Expand Down
4 changes: 2 additions & 2 deletions docs/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ It allows you to closely monitor and evaluate your application, so you can ship

LangSmith integrates seamlessly with LangChain's open source frameworks [`langchain`](https://python.langchain.com) and [`langgraph`](https://langchain-ai.github.io/langgraph/), with no extra instrumentation needed.

If you're already using either of these, see the how-to guide for [setting up LangSmith with LangChain](./observability/how_to_guides/tracing/trace_with_langchain) or [setting up LangSmith with LangGraph](https://docs.smith.langchain.com/observability/how_to_guides/tracing/trace_with_langgraph).
If you're already using either of these, see the how-to guide for [setting up LangSmith with LangChain](./observability/how_to_guides/trace_with_langchain) or [setting up LangSmith with LangGraph](https://docs.smith.langchain.com/observability/how_to_guides/tracing/trace_with_langgraph).
:::

## Observability
Expand All @@ -43,7 +43,7 @@ Observability is important for any software application, but especially so for L
This is where LangSmith can help! LangSmith has LLM-native observability, allowing you to get meaningful insights from your application. LangSmith’s observability features have you covered throughout all stages of application development - from prototyping, to beta testing, to production.

- Get started by [adding tracing](./observability) to your application.
- [Create dashboards](./observability/how_to_guides/monitoring/dashboards) to view key metrics like RPS, error rates and costs.
- [Create dashboards](./observability/how_to_guides/dashboards) to view key metrics like RPS, error rates and costs.

## Evals

Expand Down
Loading
Loading