Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Create metrics for external cache services #3

Merged
merged 5 commits into from
Oct 25, 2024

Conversation

happyandslow
Copy link
Collaborator

No description provided.

@happyandslow happyandslow changed the title Create metrics for external cache services [Core] Create metrics for external cache services Sep 19, 2024
@happyandslow happyandslow force-pushed the lexu/vineyard-metrics branch from e3eee41 to e032a48 Compare October 2, 2024 01:13
Based on another version of vllm: sighingnow@d347dab

Cherry-pick from commit d347dab

Signed-off-by: Tao He <[email protected]>
(cherry picked from commit 1545f6bf7edcd667e305d3fbcadd913066f04747)

resolving vllm update diff

temporarily comment out torch.distributed for single node env

add VineyardCacheConfig with https://github.com/v6d-io/v6d/blob/ebe8f077e3d3780a27d49238c501854b6b8e29df/modules/llm-cache/ds/kv_cache_block.cc#L163 commented out; cache_ops fix

remove CacheConfig from argument (configure through ENV)

v6d: fix integration w/ v1 APIs

Signed-off-by: Haiyang Shi <[email protected]>

Change model_runner to latest version

cherry pick model_runner from d347dab source sighingnow@d347dab

fix reshape_and_cache_flash argument

add cache prefetch/update to work_base

clean up

Fix after rebase to 029c71d

remove tensor copy from cache managed address to pin memory

clean up

Add fixes to address comments

adding cache service metrics initial

adding cache service metrics initial

update ttft metrics

update prefix caching with max num seqs argument
@happyandslow happyandslow force-pushed the lexu/vineyard-metrics branch from e032a48 to 306aa72 Compare October 2, 2024 02:09
vllm/entrypoints/llm.py Show resolved Hide resolved
vllm/entrypoints/llm.py Outdated Show resolved Hide resolved
vllm/worker/vineyard_llm_cache.py Outdated Show resolved Hide resolved
vllm/engine/llm_engine.py Outdated Show resolved Hide resolved
vllm/engine/metrics_types.py Show resolved Hide resolved
@@ -26,6 +26,8 @@
from vllm.usage.usage_lib import UsageContext
from vllm.utils import Counter, deprecate_kwargs, is_list_of

import numpy as np

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's best not to import all names from a module, but just those names that you actually use. In other words, you should try to always use the from ... import ... form of imports.

Also, this import line probably should go with other "system"/"thirdparty" imports before the vllm-specific imports.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please take a look

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved
@@ -734,10 +740,29 @@ def _run_engine(
f"est. speed input: {in_spd:.2f} toks/s, "
f"output: {out_spd:.2f} toks/s")
pbar.update(1)
if self.llm_engine.cache_service_metrics is not None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics calculation and logging logic should go inside the individual metrics class instead of embedded into the vLLM engine high-level event-loop:

  • the engine logic should encodes abstract operations such as collecting one measurement per step, or finish the measurements (and trigger all derived metrics calculation) once a request is finished.
  • the individual metrics class will handle its own logic of which measurement to track and derived metrics to calculate and log.

This way, the engine code will remain stable while we plug in different metrics implementations

vllm/worker/model_runner.py Outdated Show resolved Hide resolved
vllm/worker/vineyard_llm_cache.py Outdated Show resolved Hide resolved
vllm/worker/vineyard_llm_cache.py Outdated Show resolved Hide resolved
vllm/worker/vineyard_llm_cache.py Outdated Show resolved Hide resolved
vllm/worker/vineyard_llm_cache.py Show resolved Hide resolved
Copy link

@Jeffwan Jeffwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the PR comments are not addressed yet. Let's quickly resolve the easy comments and merge this one

vllm/engine/llm_engine.py Outdated Show resolved Hide resolved
@@ -26,6 +26,8 @@
from vllm.usage.usage_lib import UsageContext
from vllm.utils import Counter, deprecate_kwargs, is_list_of

import numpy as np
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please take a look

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved
@Jeffwan Jeffwan merged commit a8ae12c into feat/distributed-kv-cache Oct 25, 2024
Jeffwan pushed a commit that referenced this pull request Nov 17, 2024
Optimize the KV transfer pipe implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants