-
Notifications
You must be signed in to change notification settings - Fork 300
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Currently, **`QKVLinear` is overly complex** because it handles **both QKV computation and KV cache management**. Although **QKVLinear’s role and KV cache strategy should be independent**, the current implementation **forces QKVLinear to manage KV cache**, making it necessary to **carefully maintain every QKVLinear subclass** (`FusedQKV`, `GroupedQKV`, `RoPEQKV`, etc.) to ensure they correctly handle KV cache. **Key Changes in This PR** This PR **removes KV cache logic from `QKVLinear`**, turning it into a **pure `forward`-only class** (such as QKV proj and RoPE) that **no longer needs to handle decoding**. Instead, **Attention now owns the KV cache directly**, making it **more flexible for future KV cache strategies**. Currently, **`QKVLinear` supports only one KV cache behavior**, which maintains a **fixed max length**. However, in the near future, we will introduce more **KV cache strategies**, such as: - **Sliding Window Attention** → Requires a **sliding window KV cache**. - **Sparse Attention** → Needs a KV cache that **dynamically selects sparse KV** (similar to DeepSeek). https://arxiv.org/abs/2502.11089 **Implementation Details** A key aspect of this refactor is **how query positions and key positions are generated**. Previously, the related logic was **scattered across multiple places**, but now, **positions are computed in a single place**: - **Query positions** → Must be determined **before RoPE** since RoPE requires them. The **same query positions** are then **reused throughout the code**. - **Key positions** → Only the **KV cache layer** can determine them **accurately** since **KV cache strategies** directly affect key positions. So, **KV cache is now responsible for generating key positions**. In addition, **`KVState` now carries both KV values and key positions**.
- Loading branch information
Showing
89 changed files
with
555 additions
and
685 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.