diff --git a/docs/index.rst b/docs/index.rst
index aaa46384490..b8067c25d80 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -57,6 +57,7 @@ The core features include:
    references/sampling_params.md
    references/hyperparameter_tuning.md
    references/benchmark_and_profiling.md
+   references/accuracy_evaluation.md
    references/custom_chat_template.md
    references/deepseek.md
    references/llama_405B.md
diff --git a/docs/references/accuracy.md b/docs/references/accuracy.md
index efe2b537449..053dd8369d1 100644
--- a/docs/references/accuracy.md
+++ b/docs/references/accuracy.md
@@ -2,9 +2,9 @@
 
 This guide shows how to evaluate model accuracy using SGLang's [built-in benchmarks](https://github.com/sgl-project/sglang/tree/b045841baeff37a5601fcde23fa98bd09d942c36/benchmark).
 
-## Evalutating model accuracy with SGLang
+## Benchmarking Model Accuracy
 
-This is a reference workflow for the [MMLU benchmark](). For more details or other benchmarks, please refer to the README in each specific benchmark folder under [sglang/benchmark](https://github.com/sgl-project/sglang/tree/b045841baeff37a5601fcde23fa98bd09d942c36/benchmark).
+This is a reference workflow for the [MMLU benchmark](https://github.com/sgl-project/sglang/tree/main/benchmark/mmlu). For more details or other benchmarks, please refer to the README in each specific benchmark folder under [sglang/benchmark](https://github.com/sgl-project/sglang/tree/b045841baeff37a5601fcde23fa98bd09d942c36/benchmark).
 
 ```bash
 # Step 1: Download the dataset