Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to add Doge #35889

Open
2 tasks done
LoserCheems opened this issue Jan 25, 2025 · 0 comments · May be fixed by #35891
Open
2 tasks done

Request to add Doge #35889

LoserCheems opened this issue Jan 25, 2025 · 0 comments · May be fixed by #35891

Comments

@LoserCheems
Copy link

LoserCheems commented Jan 25, 2025

Model description

Doge is an architecture that combines the advantages of state-space and self-attention. It solves the problem of self-attention getting lost in long sequences by computing dynamic mask from cached value states using zeroth-order holding. It can also use wsd_scheduler on top of dense weight checkpoints to additionally train a sparsely activated feedforward network expansion layer.

paper: https://arxiv.org/abs/2412.11834

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

Repository: https://github.com/LoserCheems/WonderfulMatrices
Weights: https://huggingface.co/collections/JingzeShi/doge-slm-677fd879f8c4fd0f43e05458

@LoserCheems LoserCheems linked a pull request Jan 25, 2025 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant