Dagonet

Making a full process from data collection to GRPO for a decent Mini model

Features/roadmap:

Deprecated Features (were tested, removed):

MTP: It needs scale to work, probably need 7b+
Mixture of a Million experts and Ultra Mem: I liked them, but its oddities and other performance issues force me to stay away, may keep just MoM though, more robust

Citations:

@misc{vaswani2023attentionneed,
      title={Attention Is All You Need}, 
      author={Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
      year={2023},
      eprint={1706.03762},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/1706.03762}, 
}

@misc{modded_nanogpt_2024,
      author={Keller Jordan and Jeremy Bernstein and Brendan Rappazzo and @fernbear.bsky.social and Boza Vlado and You Jiacheng and Franz Cesista and Braden Koszarsky and @Grad62304977},
      title={modded-nanogpt: Speedrunning the NanoGPT baseline},
      year={2024},
      url={https://github.com/KellerJordan/modded-nanogpt}
}

@misc{moe_essay,
     author={1a3orn},
     title={Introduction to Mixture of Experts (MoE)},
     year={2025},
     url={https://1a3orn.com/sub/essays-intro-to-moe.html},
     note={Accessed: 2025-03-06}
}

@misc{deepseekai2024deepseekv3technicalreport,
      title={DeepSeek-V3 Technical Report}, 
      author={DeepSeek-AI},
      year={2024},
      eprint={2412.19437},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.19437}, 
}

@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
      title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning}, 
      author={DeepSeek-AI},
      year={2025},
      eprint={2501.12948},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.12948}, 
}

@misc{yuan2025nativesparseattentionhardwarealigned,
      title={Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention}, 
      author={Jingyang Yuan and Huazuo Gao and Damai Dai and Junyu Luo and Liang Zhao and Zhengyan Zhang and Zhenda Xie and Y. X. Wei and Lean Wang and Zhiping Xiao and Yuqing Wang and Chong Ruan and Ming Zhang and Wenfeng Liang and Wangding Zeng},
      year={2025},
      eprint={2502.11089},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.11089}, 
}

Image from Reddit

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dagonet

About

Uh oh!

Releases

Packages

Languages

BambooML/Dagonet

Folders and files

Latest commit

History

Repository files navigation

Dagonet

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages