You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your great work and corresponding implementation of baselines! It really benefits the future work a lot!
I have a question about the implementation of KV cache selection in Quest.
It looks like in this repo, the Quest cache will select all the generated tokens (see
Dynamically maintaining the quest page may need some effort. Currently, I just select all the generated tokens since most of the evaluated tasks will not have a lot of generated tokens. Thank you for your efforts, and I will carefully examine your codes. You are also welcome to submit your PR if you want!
Hi, sorry for the late reply. Congrats on your acceptance of MagicPig by ICLR! I agree that controlling all methods to keep generated tokens is fair to show the effectiveness of MagicPig.
As you target long prompt short-generation scenarios, I would like to ask whether you will take short prompt long generation into consideration in the future.
Thanks for your great work, that's my favorite KV compression paper!
It might become popular as reasoning models gain attention. However, I have not figured out what the most efficient and accurate way to design algorithms to do this is. Also, it seems reproducing the results of the long-generation models (e.g., R1 will generate 32K tokens to solve some math problems) is very time-consuming, and I did not even know what can be a good metric.
Thanks for your great work and corresponding implementation of baselines! It really benefits the future work a lot!
I have a question about the implementation of KV cache selection in Quest.
It looks like in this repo, the Quest cache will select all the generated tokens (see
MagicPIG/evaluations/RULER/pred/quest_cache.py
Line 127 in ac9aa36
The selection of tokens is limited to the prompt tokens because the KV page is built only on prefill.
It looks like the original Quest implementation (https://github.com/mit-han-lab/Quest/blob/main/evaluation/quest_attention.py ) will dynamically update the KV pages during the decoding.
Will you take the implementation of dynamic KV page updating into consideration? I implemented a very simple but not perfect version for this here: https://github.com/Monstertail/MagicPIG/blob/b635d06ae2c68c1d2949f2e95f358fb5746f6108/RULER/RULER/scripts/pred/quest_cache.py#L253 . If you are interested, we can think about how to make it better together.
The text was updated successfully, but these errors were encountered: