[Core] Create metrics for external cache services #3

happyandslow · 2024-09-16T21:27:35Z

No description provided.

Based on another version of vllm: sighingnow@d347dab Cherry-pick from commit d347dab Signed-off-by: Tao He <[email protected]> (cherry picked from commit 1545f6bf7edcd667e305d3fbcadd913066f04747) resolving vllm update diff temporarily comment out torch.distributed for single node env add VineyardCacheConfig with https://github.com/v6d-io/v6d/blob/ebe8f077e3d3780a27d49238c501854b6b8e29df/modules/llm-cache/ds/kv_cache_block.cc#L163 commented out; cache_ops fix remove CacheConfig from argument (configure through ENV) v6d: fix integration w/ v1 APIs Signed-off-by: Haiyang Shi <[email protected]> Change model_runner to latest version cherry pick model_runner from d347dab source sighingnow@d347dab fix reshape_and_cache_flash argument add cache prefetch/update to work_base clean up Fix after rebase to 029c71d remove tensor copy from cache managed address to pin memory clean up Add fixes to address comments adding cache service metrics initial adding cache service metrics initial update ttft metrics update prefix caching with max num seqs argument

vllm/entrypoints/llm.py

vllm/worker/vineyard_llm_cache.py

vllm/engine/llm_engine.py

vllm/engine/metrics_types.py

FuturisticWater · 2024-10-02T18:03:04Z

vllm/entrypoints/llm.py

@@ -26,6 +26,8 @@
 from vllm.usage.usage_lib import UsageContext
 from vllm.utils import Counter, deprecate_kwargs, is_list_of

+import numpy as np


It's best not to import all names from a module, but just those names that you actually use. In other words, you should try to always use the from ... import ... form of imports.

Also, this import line probably should go with other "system"/"thirdparty" imports before the vllm-specific imports.

please take a look

vllm/entrypoints/llm.py

FuturisticWater · 2024-10-02T18:15:00Z

vllm/entrypoints/llm.py

@@ -734,10 +740,29 @@ def _run_engine(
                                f"est. speed input: {in_spd:.2f} toks/s, "
                                f"output: {out_spd:.2f} toks/s")
                        pbar.update(1)
+            if self.llm_engine.cache_service_metrics is not None:


The metrics calculation and logging logic should go inside the individual metrics class instead of embedded into the vLLM engine high-level event-loop:

the engine logic should encodes abstract operations such as collecting one measurement per step, or finish the measurements (and trigger all derived metrics calculation) once a request is finished.

the individual metrics class will handle its own logic of which measurement to track and derived metrics to calculate and log.

This way, the engine code will remain stable while we plug in different metrics implementations

vllm/worker/model_runner.py

vllm/worker/vineyard_llm_cache.py

Jeffwan

seems the PR comments are not addressed yet. Let's quickly resolve the easy comments and merge this one

vllm/engine/llm_engine.py

Jeffwan · 2024-10-24T20:56:38Z

vllm/entrypoints/llm.py

@@ -26,6 +26,8 @@
 from vllm.usage.usage_lib import UsageContext
 from vllm.utils import Counter, deprecate_kwargs, is_list_of

+import numpy as np


please take a look

vllm/entrypoints/llm.py

Optimize the KV transfer pipe implementation

happyandslow changed the title ~~Create metrics for external cache services~~ [Core] Create metrics for external cache services Sep 19, 2024

happyandslow force-pushed the lexu/vineyard-metrics branch from e3eee41 to e032a48 Compare October 2, 2024 01:13

happyandslow force-pushed the lexu/vineyard-metrics branch from e032a48 to 306aa72 Compare October 2, 2024 02:09

Jeffwan reviewed Oct 2, 2024

View reviewed changes

vllm/entrypoints/llm.py Show resolved Hide resolved

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

vllm/worker/vineyard_llm_cache.py Outdated Show resolved Hide resolved

FuturisticWater reviewed Oct 2, 2024

View reviewed changes

happyandslow added 3 commits October 9, 2024 21:35

fix token_len stat; Add median metrics

424b64c

FIX: avg metrics collection; using cuda_event to collect metrics

067d536

add reshape time

222f835

Jeffwan reviewed Oct 24, 2024

View reviewed changes

Address comments

f76b14c

Jeffwan merged commit a8ae12c into feat/distributed-kv-cache Oct 25, 2024

Jeffwan pushed a commit that referenced this pull request Nov 17, 2024

Merge pull request #3 from KuntaiDu/yihua-kv-pipe

1377912

Optimize the KV transfer pipe implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Create metrics for external cache services #3

[Core] Create metrics for external cache services #3

happyandslow commented Sep 16, 2024

FuturisticWater Oct 2, 2024

Jeffwan Oct 24, 2024

FuturisticWater Oct 2, 2024

Jeffwan left a comment

Jeffwan Oct 24, 2024

[Core] Create metrics for external cache services #3

[Core] Create metrics for external cache services #3

Conversation

happyandslow commented Sep 16, 2024

FuturisticWater Oct 2, 2024

Choose a reason for hiding this comment

Jeffwan Oct 24, 2024

Choose a reason for hiding this comment

FuturisticWater Oct 2, 2024

Choose a reason for hiding this comment

Jeffwan left a comment

Choose a reason for hiding this comment

Jeffwan Oct 24, 2024

Choose a reason for hiding this comment