-
Notifications
You must be signed in to change notification settings - Fork 30.3k
Closed
Labels
Description
🌟 New model addition
Is it feasible to add Megatron models ? It seems the architecture is really just a GPT2, most of the work should be in creating the config, fusing layers from the available weights here: https://github.com/pytorch/fairseq/tree/master/examples/megatron_11b and making them available.
There are Nvidia's megatron (Bert and Gpt variants) and Facebook-11b megatron (gpt variant)
If we stick to that then we can't run the model on a single GPU, so we should probably make sure this is compatible with:
Is keeping the current GPT2 architecture and using deepspeed's ZeRo and other parallelism schemes without touching original implementation feasible?
Model description
Open source status
- the model implementation is available: https://github.com/ngoyal2707/Megatron-LM/blob/adb23324c222aad0aad89308e70302d996a5eaeb/mpu/transformer.py (Most of the work seems to be on Matrix parallelization)
- the model weights are available: https://dl.fbaipublicfiles.com/fairseq/models/model_parallel/megatron_11b.tar.gz (Megatron 11b), https://github.com/NVIDIA/Megatron-LM#downloading-checkpoints (Nvidia's version, 3b and 8.3b don't seem to be available)
- who are the authors: (mention them, if possible by @gh-username) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro https://arxiv.org/abs/1909.08053
https://developer.nvidia.com/blog/language-modeling-using-megatron-a100-gpu/