Enhance cache service metrics #13

happyandslow · 2024-11-18T22:52:13Z

No description provided.

github-actions · 2024-11-18T22:52:26Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

vllm/worker/vineyard_llm_cache.py

vllm/engine/metrics.py

vllm/engine/metrics_types.py

change cache metrics to maintain fixed-sized queue Adding metrics to llm engine stats and exposed through prometheus gauge Adding metrics to llm engine stats and exposed through prometheus gauge clear metrics counters upon update

Jeffwan · 2024-12-02T22:23:35Z

@happyandslow please help address comments as well. We need to ack reviewer's comments.

Jeffwan · 2024-12-12T21:56:16Z

vllm/engine/metrics.py

@@ -190,6 +190,89 @@ def __init__(self, labelnames: List[str], max_model_len: int):
            labelnames=labelnames,
            multiprocess_mode="sum",
        )
+
+        self.gauge_cache_service_tokens_hit_rate = self._gauge_cls(
+            name="vllm:gauge_cache_service_tokens_hit_rate",


the naming is kind of inconsistent here. Other metric string doesn't include the metric type. for example, "vllm:gauge_cache_service_tokens_hit_rate" -> "vllm:cache_service_tokens_hit_rate"

can you update the name to be consistent?

Check existing metrics names -- all metrics start with the metrics types they use (e.g., line vllm-project#81 self.gauge_gpu_prefix_cache_hit_rate = self._gauge_cls(...)

vllm/engine/metrics.py

vllm/worker/vineyard_llm_cache.py

Jeffwan · 2024-12-16T18:01:34Z

@happyandslow Could you look at the review comments? Seems the new commit didn't address them.

addressing comments

Jeffwan

overall looks good to me. Let's merge it and run some testing

FuturisticWater · 2024-12-17T22:40:05Z

vllm/engine/llm_engine.py

+            self.cache_service_metrics.time_load.clear()
+            self.cache_service_metrics.time_reshape.clear()
+            self.cache_service_metrics.time_unload.clear()
+            self.cache_service_metrics.time_update.clear()


Can we simply swap the corresponding lists, e.g., cache_service_time_query (initially empty) with self.cache_service_metrics.time_query, instead of of copying the whole list of elements?

vllm/worker/vineyard_llm_cache.py

…heck

DwyaneShi · 2024-12-18T19:50:28Z

vllm/worker/vineyard_llm_cache.py

+            duration = copy_start.elapsed_time(copy_end) / 1000.0
+            self.metrics.add_time_unload(duration)
+
+            torch.cuda.synchronize()


Is this torch.cuda.synchronize() being used to measure execution time accurately, or is it intended as a synchronization barrier between the copy_ and reshape_and_cache_flash?

I kind of feel this is redundant.

DwyaneShi reviewed Nov 20, 2024

View reviewed changes

vllm/worker/vineyard_llm_cache.py Show resolved Hide resolved

FuturisticWater reviewed Nov 27, 2024

View reviewed changes

vllm/engine/metrics.py Outdated Show resolved Hide resolved

vllm/engine/metrics_types.py Outdated Show resolved Hide resolved

vllm/engine/metrics_types.py Show resolved Hide resolved

happyandslow added 2 commits November 27, 2024 16:51

change cache metrics to maintain fixed-sized queue

e4621b4

change cache metrics to maintain fixed-sized queue Adding metrics to llm engine stats and exposed through prometheus gauge Adding metrics to llm engine stats and exposed through prometheus gauge clear metrics counters upon update

fix rebase error

bc4f074

happyandslow force-pushed the lexu/measurement-flags branch from f93fa67 to bc4f074 Compare November 28, 2024 01:09

happyandslow added 3 commits December 2, 2024 16:49

addressing comments

9343884

addressing comments

bca63e8

add metrics collection flag

f5281a8

Jeffwan requested changes Dec 12, 2024

View reviewed changes

update with metrics flags

04cfca0

happyandslow and others added 2 commits December 16, 2024 11:41

addressing comments

697f21a

addressing comments

addressing comments

1511d60

Jeffwan approved these changes Dec 17, 2024

View reviewed changes

FuturisticWater reviewed Dec 17, 2024

View reviewed changes

addressing comments: avoid copying & moving counters out of metrics c…

032dde2

…heck

DwyaneShi reviewed Dec 18, 2024

View reviewed changes

Le Xu added 2 commits December 19, 2024 11:47

add async ops metrics

69ad549

add async ops metrics

95db7aa

FuturisticWater approved these changes Dec 19, 2024

View reviewed changes

Jeffwan approved these changes Dec 23, 2024

View reviewed changes

Jeffwan merged commit 8d34fa4 into feat/distributed-kv-cache Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance cache service metrics #13

Enhance cache service metrics #13

happyandslow commented Nov 18, 2024

github-actions bot commented Nov 18, 2024

Jeffwan commented Dec 2, 2024

Jeffwan Dec 12, 2024

happyandslow Dec 16, 2024

Jeffwan Dec 16, 2024

Jeffwan commented Dec 16, 2024 •

edited

Loading

Jeffwan left a comment

FuturisticWater Dec 17, 2024

DwyaneShi Dec 18, 2024

Jeffwan Dec 23, 2024

Enhance cache service metrics #13

Enhance cache service metrics #13

Conversation

happyandslow commented Nov 18, 2024

github-actions bot commented Nov 18, 2024

Jeffwan commented Dec 2, 2024

Jeffwan Dec 12, 2024

Choose a reason for hiding this comment

happyandslow Dec 16, 2024

Choose a reason for hiding this comment

Jeffwan Dec 16, 2024

Choose a reason for hiding this comment

Jeffwan commented Dec 16, 2024 • edited Loading

Jeffwan left a comment

Choose a reason for hiding this comment

FuturisticWater Dec 17, 2024

Choose a reason for hiding this comment

DwyaneShi Dec 18, 2024

Choose a reason for hiding this comment

Jeffwan Dec 23, 2024

Choose a reason for hiding this comment

Jeffwan commented Dec 16, 2024 •

edited

Loading