Request to add Doge #35889

LoserCheems · 2025-01-25T14:28:39Z

Model description

Doge is an architecture that combines the advantages of state-space and self-attention. It solves the problem of self-attention getting lost in long sequences by computing dynamic mask from cached value states using zeroth-order holding. It can also use wsd_scheduler on top of dense weight checkpoints to additionally train a sparsely activated feedforward network expansion layer.

paper: https://arxiv.org/abs/2412.11834

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

Repository: https://github.com/LoserCheems/WonderfulMatrices
Weights: https://huggingface.co/collections/JingzeShi/doge-slm-677fd879f8c4fd0f43e05458

The text was updated successfully, but these errors were encountered:

LoserCheems added the New model label Jan 25, 2025

LoserCheems linked a pull request Jan 25, 2025 that will close this issue

Add Doge model #35891

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request to add Doge #35889

Request to add Doge #35889

LoserCheems commented Jan 25, 2025 •

edited

Loading

Request to add Doge #35889

Request to add Doge #35889

Comments

LoserCheems commented Jan 25, 2025 • edited Loading

Model description

Open source status

Provide useful links for the implementation

LoserCheems commented Jan 25, 2025 •

edited

Loading