diff --git a/docs/administration/how_to_guides/organization_management/manage_organization_by_api.mdx b/docs/administration/how_to_guides/organization_management/manage_organization_by_api.mdx index 6d099d79..7ce975ed 100644 --- a/docs/administration/how_to_guides/organization_management/manage_organization_by_api.mdx +++ b/docs/administration/how_to_guides/organization_management/manage_organization_by_api.mdx @@ -144,7 +144,7 @@ If the header is not present, operations will default to the workspace the API k ## Security Settings :::note -"Shared resources" in this context refer to [public prompts](../../../prompt_engineering/how_to_guides/prompts/create_a_prompt#save-your-prompt), [shared runs](../../../observability/how_to_guides/tracing/share_trace), and [shared datasets](../../../evaluation/how_to_guides/share_dataset.mdx). +"Shared resources" in this context refer to [public prompts](../../../prompt_engineering/how_to_guides/create_a_prompt#save-your-prompt), [shared runs](../../../observability/how_to_guides/share_trace), and [shared datasets](../../../evaluation/how_to_guides/share_dataset.mdx). ::: - Add Rule`, and configure the rule as follows: ![](./static/P2SampleTraces.png) @@ -140,7 +140,7 @@ be thought of as a tree of runs making up an API call. When a run rule matches a upgrades to be retained for 400 days. Therefore, to make sure we have the proper sampling rate on traces, we take advantage of the -[filtering](../../observability/how_to_guides/monitoring/rules#step-2-define-the-filter) functionality of run rules. +[filtering](../../observability/how_to_guides/rules#step-2-define-the-filter) functionality of run rules. We add add a filter condition to only match the "root" run in the run tree. This is distinct per trace, so our 10% sampling will upgrade 10% of traces, rather 10% of runs, which could correspond to more than 10% of traces. If desired, we can optionally add diff --git a/docs/evaluation/concepts/index.mdx b/docs/evaluation/concepts/index.mdx index d15e01ba..d58f0da9 100644 --- a/docs/evaluation/concepts/index.mdx +++ b/docs/evaluation/concepts/index.mdx @@ -98,7 +98,7 @@ There are a number of ways to define and run evaluators: - **Custom code**: Define [custom evaluators](/evaluation/how_to_guides/custom_evaluator) as Python or TypeScript functions and run them client-side using the SDKs or server-side via the UI. - **Built-in evaluators**: LangSmith has a number of built-in evaluators that you can configure and run via the UI. -You can run evaluators using the LangSmith SDK ([Python](https://docs.smith.langchain.com/reference/python) and [TypeScript](https://docs.smith.langchain.com/reference/js)), via the [Prompt Playground](../../prompt_engineering/concepts#prompt-playground), or by configuring [Rules](../../observability/how_to_guides/monitoring/rules) to automatically run them on particular tracing projects or datasets. +You can run evaluators using the LangSmith SDK ([Python](https://docs.smith.langchain.com/reference/python) and [TypeScript](https://docs.smith.langchain.com/reference/js)), via the [Prompt Playground](../../prompt_engineering/concepts#prompt-playground), or by configuring [Rules](../../observability/how_to_guides/rules) to automatically run them on particular tracing projects or datasets. #### Evaluation techniques @@ -162,7 +162,7 @@ It is offline because we're evaluating on a pre-compiled set of data. An online evaluation, on the other hand, is one in which we evaluate a deployed application's outputs on real traffic, in near realtime. Offline evaluations are used for testing a version(s) of your application pre-deployment. -You can run offline evaluations client-side using the LangSmith SDK ([Python](https://docs.smith.langchain.com/reference/python) and [TypeScript](https://docs.smith.langchain.com/reference/js)). You can run them server-side via the [Prompt Playground](../../prompt_engineering/concepts#prompt-playground) or by configuring [automations](/observability/how_to_guides/monitoring/rules) to run certain evaluators on every new experiment against a specific dataset. +You can run offline evaluations client-side using the LangSmith SDK ([Python](https://docs.smith.langchain.com/reference/python) and [TypeScript](https://docs.smith.langchain.com/reference/js)). You can run them server-side via the [Prompt Playground](../../prompt_engineering/concepts#prompt-playground) or by configuring [automations](/observability/how_to_guides/rules) to run certain evaluators on every new experiment against a specific dataset. ![Offline](./static/offline.png) diff --git a/docs/evaluation/how_to_guides/annotation_queues.mdx b/docs/evaluation/how_to_guides/annotation_queues.mdx index f42f70ba..c54069f7 100644 --- a/docs/evaluation/how_to_guides/annotation_queues.mdx +++ b/docs/evaluation/how_to_guides/annotation_queues.mdx @@ -71,7 +71,7 @@ To assign runs to an annotation queue, either: 2. Select multiple runs in the runs table then click **Add to Annotation Queue** at the bottom of the page. ![](./static/multi_select_annotation_queue.png) -3. [Set up an automation rule](../../../observability/how_to_guides/monitoring/rules) that automatically assigns runs which pass a certain filter and sampling condition to an annotation queue. +3. [Set up an automation rule](../../../observability/how_to_guides/rules) that automatically assigns runs which pass a certain filter and sampling condition to an annotation queue. 4. Select one or multiple experiments from the dataset page and click **Annotate**. From the resulting popup, you may either create a new queue or add the runs to an existing one: ![](./static/annotate_experiment.png) diff --git a/docs/evaluation/how_to_guides/attach_user_feedback.mdx b/docs/evaluation/how_to_guides/attach_user_feedback.mdx index 9a220bba..84798746 100644 --- a/docs/evaluation/how_to_guides/attach_user_feedback.mdx +++ b/docs/evaluation/how_to_guides/attach_user_feedback.mdx @@ -20,10 +20,10 @@ Before diving into this content, it might be helpful to read the following: In many applications, but even more so for LLM applications, it is important to collect user feedback to understand how your application is performing in real-world scenarios. The ability to observe user feedback along with trace data can be very powerful to drill down into the most interesting datapoints, then send those datapoints for further review, automatic evaluation, or even datasets. -To learn more about how to filter traces based on various attributes, including user feedback, see [this guide](../../../observability/how_to_guides/monitoring/filter_traces_in_application) +To learn more about how to filter traces based on various attributes, including user feedback, see [this guide](../../../observability/how_to_guides/filter_traces_in_application) LangSmith makes it easy to attach user feedback to traces. -It's often helpful to expose a simple mechanism (such as a thumbs-up, thumbs-down button) to collect user feedback for your application responses. You can then use the LangSmith SDK or API to send feedback for a trace. To get the `run_id` of a logged run, see [this guide](../../../observability/how_to_guides/tracing/access_current_span). +It's often helpful to expose a simple mechanism (such as a thumbs-up, thumbs-down button) to collect user feedback for your application responses. You can then use the LangSmith SDK or API to send feedback for a trace. To get the `run_id` of a logged run, see [this guide](../../../observability/how_to_guides/access_current_span). :::note diff --git a/docs/evaluation/how_to_guides/bind_evaluator_to_dataset.mdx b/docs/evaluation/how_to_guides/bind_evaluator_to_dataset.mdx index d289071c..70fbd2c8 100644 --- a/docs/evaluation/how_to_guides/bind_evaluator_to_dataset.mdx +++ b/docs/evaluation/how_to_guides/bind_evaluator_to_dataset.mdx @@ -7,7 +7,7 @@ sidebar_position: 2 While you can specify evaluators to grade the results of your experiments programmatically (see [this guide](./evaluate_llm_application) for more information), you can also bind evaluators to a dataset in the UI. This allows you to configure automatic evaluators that grade your experiment results. We have support for both LLM-based evaluators, and custom python code evaluators. -The process for configuring this is very similar to the process for configuring an [online evaluator](../../../observability/how_to_guides/monitoring/online_evaluations) for traces. +The process for configuring this is very similar to the process for configuring an [online evaluator](../../../observability/how_to_guides/online_evaluations) for traces. :::note Only affects subsequent experiment runs When you configure an evaluator for a dataset, it will only affect the experiment runs that are created after the evaluator is configured. It will not affect the evaluation of experiment runs that were created before the evaluator was configured. diff --git a/docs/evaluation/how_to_guides/create_few_shot_evaluators.mdx b/docs/evaluation/how_to_guides/create_few_shot_evaluators.mdx index ecdbbb78..e1cb664d 100644 --- a/docs/evaluation/how_to_guides/create_few_shot_evaluators.mdx +++ b/docs/evaluation/how_to_guides/create_few_shot_evaluators.mdx @@ -11,7 +11,7 @@ you to automatically collect human corrections on evaluator prompts, which are t :::tip Recommended Reading Before learning how to create few-shot evaluators, it might be helpful to learn how to setup automations (both online and offline) and how to leave corrections on evaluator scores: -- [Set up online evaluations](../../../observability/how_to_guides/monitoring/online_evaluations) +- [Set up online evaluations](../../../observability/how_to_guides/online_evaluations) - [Bind an evaluator to a dataset in the UI (offline evaluation)](./bind_evaluator_to_dataset) - [Audit evaluator scores](./audit_evaluator_scores) @@ -24,7 +24,7 @@ The default maximum few-shot examples to use in the prompt is 5. Examples are pu ::: -When creating an [online](../../../observability/how_to_guides/monitoring/online_evaluations) or [offline](./bind_evaluator_to_dataset) evaluator - from a tracing project or a dataset, respectively - you will see the option to use corrections as few-shot examples. Note that these types of evaluators +When creating an [online](../../../observability/how_to_guides/online_evaluations) or [offline](./bind_evaluator_to_dataset) evaluator - from a tracing project or a dataset, respectively - you will see the option to use corrections as few-shot examples. Note that these types of evaluators are only supported when using mustache prompts - you will not be able to click this option if your prompt uses f-string formatting. When you select this, we will auto-create a few-shot prompt for you. Each individual few-shot example will be formatted according to this prompt, and inserted into your main prompt in place of the `{{Few-shot examples}}` template variable which will be auto-added above. Your few-shot prompt should contain the same variables as your main prompt, plus a `few_shot_explanation` and a score variable which should have the same name diff --git a/docs/evaluation/how_to_guides/evaluate_llm_application.mdx b/docs/evaluation/how_to_guides/evaluate_llm_application.mdx index e8ac1937..76dc3cc9 100644 --- a/docs/evaluation/how_to_guides/evaluate_llm_application.mdx +++ b/docs/evaluation/how_to_guides/evaluate_llm_application.mdx @@ -92,7 +92,7 @@ First we need an application to evaluate. Let's create a simple toxicity classif /> We've optionally enabled tracing to capture the inputs and outputs of each step in the pipeline. -To understand how to annotate your code for tracing, please refer to [this guide](../../../observability/how_to_guides/tracing/annotate_code). +To understand how to annotate your code for tracing, please refer to [this guide](../../../observability/how_to_guides/annotate_code). ## Create or select a dataset diff --git a/docs/evaluation/how_to_guides/evaluate_pairwise.mdx b/docs/evaluation/how_to_guides/evaluate_pairwise.mdx index 375791c0..d3b46174 100644 --- a/docs/evaluation/how_to_guides/evaluate_pairwise.mdx +++ b/docs/evaluation/how_to_guides/evaluate_pairwise.mdx @@ -88,7 +88,7 @@ which asks the LLM to decide which is better between two AI assistant responses. :::info Optional LangChain Usage -In the Python example below, we are pulling [this structured prompt](https://smith.langchain.com/hub/langchain-ai/pairwise-evaluation-2) from the [LangChain Hub](../../../prompt_engineering/how_to_guides/prompts/langchain_hub) and using it with a LangChain chat model wrapper. +In the Python example below, we are pulling [this structured prompt](https://smith.langchain.com/hub/langchain-ai/pairwise-evaluation-2) from the [LangChain Hub](../../../prompt_engineering/how_to_guides/langchain_hub) and using it with a LangChain chat model wrapper. **Usage of LangChain is totally optional.** To illustrate this point, the TypeScript example uses the OpenAI SDK directly. diff --git a/docs/evaluation/how_to_guides/evaluate_with_attachments.mdx b/docs/evaluation/how_to_guides/evaluate_with_attachments.mdx index 6581577f..96b9cbfd 100644 --- a/docs/evaluation/how_to_guides/evaluate_with_attachments.mdx +++ b/docs/evaluation/how_to_guides/evaluate_with_attachments.mdx @@ -7,7 +7,7 @@ import { # Run an evaluation with large file inputs -In addition to supporting [file attachments with traces](../../../observability/how_to_guides/tracing/upload_files_with_traces), LangSmith supports arbitrary file attachments with your examples, which you can consume when you run experiments. +In addition to supporting [file attachments with traces](../../../observability/how_to_guides/upload_files_with_traces), LangSmith supports arbitrary file attachments with your examples, which you can consume when you run experiments. This is particularly useful when working with LLM applications that require multimodal inputs or outputs. diff --git a/docs/evaluation/how_to_guides/index.md b/docs/evaluation/how_to_guides/index.md index bfb8faee..df83996b 100644 --- a/docs/evaluation/how_to_guides/index.md +++ b/docs/evaluation/how_to_guides/index.md @@ -42,7 +42,7 @@ Evaluate and improve your application before deploying it. - [Evaluate with repetitions](./how_to_guides/repetition) - [Handle model rate limits](./how_to_guides/rate_limiting) -- [Print detailed logs (Python only)](../../observability/how_to_guides/tracing/output_detailed_logs) +- [Print detailed logs (Python only)](../../observability/how_to_guides/output_detailed_logs) - [Run an evaluation locally (beta, Python only)](./how_to_guides/local) ## Testing integrations @@ -56,8 +56,8 @@ Run evals using your favorite testing tools: Evaluate and monitor your system's live performance on production data. -- [Set up an LLM-as-judge online evaluator](../../observability/how_to_guides/monitoring/online_evaluations#configure-llm-as-judge-evaluators) -- [Set up a custom code online evaluator](../../observability/how_to_guides/monitoring/online_evaluations#configure-custom-code-evaluators) +- [Set up an LLM-as-judge online evaluator](../../observability/how_to_guides/online_evaluations#configure-llm-as-judge-evaluators) +- [Set up a custom code online evaluator](../../observability/how_to_guides/online_evaluations#configure-custom-code-evaluators) - [Create a few-shot evaluator](./how_to_guides/create_few_shot_evaluators) ## Automatic evaluation diff --git a/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx b/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx index c99bf18a..ee53b095 100644 --- a/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx +++ b/docs/evaluation/how_to_guides/manage_datasets_in_application.mdx @@ -41,7 +41,7 @@ For the full list of available transformations, see [our reference](/reference/e :::note If you plan to collect production traces in your dataset from LangChain [ChatModels](https://python.langchain.com/docs/concepts/chat_models/) -or from OpenAI calls using the [LangSmith OpenAI wrapper](/observability/how_to_guides/tracing/annotate_code#wrap-the-openai-client), we offer a prebuilt Chat Model schema that converts messages and tools into industry standard openai formats that can be used downstream with any model for testing. You can also customize the template settings to match your use case. +or from OpenAI calls using the [LangSmith OpenAI wrapper](/observability/how_to_guides/annotate_code#wrap-the-openai-client), we offer a prebuilt Chat Model schema that converts messages and tools into industry standard openai formats that can be used downstream with any model for testing. You can also customize the template settings to match your use case. Please see the [dataset transformations reference](/reference/evaluation/dataset_transformations) for more information. ::: @@ -57,7 +57,7 @@ through our tracing projects to find the runs we want to add to the dataset. The :::tip An extremely powerful technique to build datasets is to drill-down into the most interesting traces, such as traces that were tagged with poor user feedback, and add them to a dataset. -For tips on how to filter traces, see the [filtering traces](../../../observability/how_to_guides/monitoring/filter_traces_in_application) guide. +For tips on how to filter traces, see the [filtering traces](../../../observability/how_to_guides/filter_traces_in_application) guide. ::: There are two ways to add data from tracing projects to datasets. @@ -78,7 +78,7 @@ the run before adding it to the dataset. ### Automatically add runs to a dataset -You can use [run rules](../../../observability/how_to_guides/monitoring/rules) to automatically add traces to a dataset based on certain conditions. For example, you could add all traces that have a certain tag to a dataset. +You can use [run rules](../../../observability/how_to_guides/rules) to automatically add traces to a dataset based on certain conditions. For example, you could add all traces that have a certain tag to a dataset. ### Add runs from an annotation queue diff --git a/docs/evaluation/how_to_guides/manage_datasets_programmatically.mdx b/docs/evaluation/how_to_guides/manage_datasets_programmatically.mdx index e56dfe2c..38b844cc 100644 --- a/docs/evaluation/how_to_guides/manage_datasets_programmatically.mdx +++ b/docs/evaluation/how_to_guides/manage_datasets_programmatically.mdx @@ -86,7 +86,7 @@ await client.createExamples({ ### Create a dataset from traces To create datasets from the runs (spans) of your traces, you can use the same approach. -For **many** more examples of how to fetch and filter runs, see the [export traces](../../../observability/how_to_guides/tracing/export_traces) guide. +For **many** more examples of how to fetch and filter runs, see the [export traces](../../../observability/how_to_guides/export_traces) guide. Below is an example:
Array[Message] | Converts any incoming data from LangChain's internal serialization format to OpenAI's standard message format using langchain's [convert_to_openai_messages](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.utils.convert_to_openai_messages.html).



If the target field is marked as required, and no matching message is found upon entry, it will attempt to extract a message (or list of messages) from several well-known LangSmith tracing formats (e.g., any traced LangChain [BaseChatModel](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.chat_models.BaseChatModel.html) run or traced run from the [LangSmith OpenAI wrapper](/observability/how_to_guides/tracing/annotate_code#wrap-the-openai-client)), and remove the original key containing the message. | -| convert_to_openai_tool | Array[Tool]



Only available on top level fields in the inputs dictionary. | Converts any incoming data into OpenAI standard tool formats here using langchain's [convert_to_openai_tool](https://python.langchain.com/api_reference/core/utils/langchain_core.utils.function_calling.convert_to_openai_tool.html)



Will extract tool definitions from a run's invocation parameters if present / no tools are found at the specified key. This is useful because LangChain chat models trace tool definitions to the `extra.invocation_params` field of the run rather than inputs. | -| remove_extra_fields | Object | Removes any field not defined in the schema for this target object. | +| Transformation Type | Target Types | Functionality | +| ------------------------- | ------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| remove_system_messages | Array[Message] | Filters a list of messages to remove any system messages. | +| convert_to_openai_message | Message

Array[Message] | Converts any incoming data from LangChain's internal serialization format to OpenAI's standard message format using langchain's [convert_to_openai_messages](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.utils.convert_to_openai_messages.html).



If the target field is marked as required, and no matching message is found upon entry, it will attempt to extract a message (or list of messages) from several well-known LangSmith tracing formats (e.g., any traced LangChain [BaseChatModel](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.chat_models.BaseChatModel.html) run or traced run from the [LangSmith OpenAI wrapper](/observability/how_to_guides/annotate_code#wrap-the-openai-client)), and remove the original key containing the message. | +| convert_to_openai_tool | Array[Tool]



Only available on top level fields in the inputs dictionary. | Converts any incoming data into OpenAI standard tool formats here using langchain's [convert_to_openai_tool](https://python.langchain.com/api_reference/core/utils/langchain_core.utils.function_calling.convert_to_openai_tool.html)



Will extract tool definitions from a run's invocation parameters if present / no tools are found at the specified key. This is useful because LangChain chat models trace tool definitions to the `extra.invocation_params` field of the run rather than inputs. | +| remove_extra_fields | Object | Removes any field not defined in the schema for this target object. | ## Chat Model prebuilt schema @@ -34,7 +34,7 @@ input messages when using our Chat Model schema, which will prevent you from sav The LLM run collection schema is built to collect data from LangChain [BaseChatModel](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.chat_models.BaseChatModel.html) -runs or traced runs from the [LangSmith OpenAI wrapper](/observability/how_to_guides/tracing/annotate_code#wrap-the-openai-client). +runs or traced runs from the [LangSmith OpenAI wrapper](/observability/how_to_guides/annotate_code#wrap-the-openai-client). Please reach out to support@langchain.dev if you have an LLM run you are tracing that is not compatible and we can extend support. diff --git a/docs/self_hosting/release_notes.mdx b/docs/self_hosting/release_notes.mdx index adf7bfc2..54880659 100644 --- a/docs/self_hosting/release_notes.mdx +++ b/docs/self_hosting/release_notes.mdx @@ -81,7 +81,7 @@ LangSmith v0.6 improves run rules performance and reliability, adds support for - Workspaces in LangSmith for improved collaboration & organization. [Learn More...](https://blog.langchain.dev/week-of-6-10-langchain-release-notes/#workspaces) - Enter the playground from scratch instead of from a trace or a prompt. [Learn More...](https://blog.langchain.dev/week-of-6-10-langchain-release-notes/#playground-from-scratch) - Variable mapping for online evaluator prompts. [Learn More...](https://blog.langchain.dev/week-of-6-10-langchain-release-notes/#variable-mapping) -- Custom Model support in Playground. [Learn More...](https://docs.smith.langchain.com/how_to_guides/playground/custom_endpoint) +- Custom Model support in Playground. [Learn More...](https://docs.smith.langchain.com/how_to_guides/custom_endpoint) ### Performance and Reliability Changes @@ -101,7 +101,7 @@ LangSmith v0.6 improves run rules performance and reliability, adds support for - Added support for Workspaces. See the [Admin concepts guide](/administration/concepts#workspaces) for more details. - Added global setting `orgCreationDisabled` to `values.yaml` to disable creation of new Organizations. -- Added support for custom TLS certificates for the for the Azure OpenAI model provider. See the [how-to guide](../prompt_engineering/how_to_guides/playground/custom_tls_certificates) for more details. +- Added support for custom TLS certificates for the for the Azure OpenAI model provider. See the [how-to guide](../prompt_engineering/how_to_guides/custom_tls_certificates) for more details. ### Deprecation notices diff --git a/vercel.json b/vercel.json index 91455610..0f9b9445 100644 --- a/vercel.json +++ b/vercel.json @@ -221,6 +221,22 @@ { "source": "/evaluation/how_to_guides/unit_testing(/?)", "destination": "/evauation/how_to_guides/pytest" + }, + { + "source": "/observability/how_to_guides/tracing/:path*", + "destination": "/observability/how_to_guides/:path*" + }, + { + "source": "/observability/how_to_guides/monitoring/:path*", + "destination": "/observability/how_to_guides/:path*" + }, + { + "source": "/prompt_engineering/how_to_guides/prompts/:path*", + "destination": "/prompt_engineering/how_to_guides/:path*" + }, + { + "source": "/prompt_engineering/how_to_guides/playground/:path*", + "destination": "/prompt_engineering/how_to_guides/:path*" } ], "builds": [