-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Create metrics for external cache services #3
Conversation
e3eee41
to
e032a48
Compare
Based on another version of vllm: sighingnow@d347dab Cherry-pick from commit d347dab Signed-off-by: Tao He <[email protected]> (cherry picked from commit 1545f6bf7edcd667e305d3fbcadd913066f04747) resolving vllm update diff temporarily comment out torch.distributed for single node env add VineyardCacheConfig with https://github.com/v6d-io/v6d/blob/ebe8f077e3d3780a27d49238c501854b6b8e29df/modules/llm-cache/ds/kv_cache_block.cc#L163 commented out; cache_ops fix remove CacheConfig from argument (configure through ENV) v6d: fix integration w/ v1 APIs Signed-off-by: Haiyang Shi <[email protected]> Change model_runner to latest version cherry pick model_runner from d347dab source sighingnow@d347dab fix reshape_and_cache_flash argument add cache prefetch/update to work_base clean up Fix after rebase to 029c71d remove tensor copy from cache managed address to pin memory clean up Add fixes to address comments adding cache service metrics initial adding cache service metrics initial update ttft metrics update prefix caching with max num seqs argument
e032a48
to
306aa72
Compare
@@ -26,6 +26,8 @@ | |||
from vllm.usage.usage_lib import UsageContext | |||
from vllm.utils import Counter, deprecate_kwargs, is_list_of | |||
|
|||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's best not to import all names from a module, but just those names that you actually use. In other words, you should try to always use the from ... import ...
form of imports.
Also, this import line probably should go with other "system"/"thirdparty" imports before the vllm-specific imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please take a look
@@ -734,10 +740,29 @@ def _run_engine( | |||
f"est. speed input: {in_spd:.2f} toks/s, " | |||
f"output: {out_spd:.2f} toks/s") | |||
pbar.update(1) | |||
if self.llm_engine.cache_service_metrics is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics calculation and logging logic should go inside the individual metrics class instead of embedded into the vLLM engine high-level event-loop:
- the engine logic should encodes abstract operations such as collecting one measurement per step, or finish the measurements (and trigger all derived metrics calculation) once a request is finished.
- the individual metrics class will handle its own logic of which measurement to track and derived metrics to calculate and log.
This way, the engine code will remain stable while we plug in different metrics implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems the PR comments are not addressed yet. Let's quickly resolve the easy comments and merge this one
@@ -26,6 +26,8 @@ | |||
from vllm.usage.usage_lib import UsageContext | |||
from vllm.utils import Counter, deprecate_kwargs, is_list_of | |||
|
|||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please take a look
Optimize the KV transfer pipe implementation
No description provided.