Adding Megatron models.

# 🌟 New model addition

Is it feasible to add  Megatron models ? It seems the architecture is really just a GPT2, most of the work should be in creating the config, fusing layers from the available weights here: https://github.com/pytorch/fairseq/tree/master/examples/megatron_11b and making them available.

There are Nvidia's megatron (Bert and Gpt variants) and Facebook-11b megatron  (gpt variant)

If we stick to that then we can't run the model on a single GPU, so we should probably make sure this is compatible with: 

- https://github.com/huggingface/transformers/pull/9208
- https://github.com/huggingface/transformers/pull/9211

**Is keeping the current GPT2 architecture and using deepspeed's ZeRo and other parallelism schemes without touching original implementation feasible?**


## Model description

https://github.com/pytorch/fairseq/blob/e3c4282551e819853952284681e9ed60398c5c4a/examples/megatron_11b/README.md



## Open source status

* [x] the model implementation is available: https://github.com/ngoyal2707/Megatron-LM/blob/adb23324c222aad0aad89308e70302d996a5eaeb/mpu/transformer.py (Most of the work seems to be on Matrix parallelization)
* [x] the model weights are available: https://dl.fbaipublicfiles.com/fairseq/models/model_parallel/megatron_11b.tar.gz (Megatron 11b), https://github.com/NVIDIA/Megatron-LM#downloading-checkpoints (Nvidia's version, 3b and 8.3b don't seem to be available)
* [x] who are the authors: (mention them, if possible by @gh-username) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro https://arxiv.org/abs/1909.08053

https://developer.nvidia.com/blog/language-modeling-using-megatron-a100-gpu/

@stas00 @patrickvonplaten

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding Megatron models. #9560

🌟 New model addition

Model description

Open source status

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adding Megatron models. #9560

Description

🌟 New model addition

Model description

Open source status

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions