Skip to content

Adding Megatron models. #9560

@Narsil

Description

@Narsil

🌟 New model addition

Is it feasible to add Megatron models ? It seems the architecture is really just a GPT2, most of the work should be in creating the config, fusing layers from the available weights here: https://github.com/pytorch/fairseq/tree/master/examples/megatron_11b and making them available.

There are Nvidia's megatron (Bert and Gpt variants) and Facebook-11b megatron (gpt variant)

If we stick to that then we can't run the model on a single GPU, so we should probably make sure this is compatible with:

Is keeping the current GPT2 architecture and using deepspeed's ZeRo and other parallelism schemes without touching original implementation feasible?

Model description

https://github.com/pytorch/fairseq/blob/e3c4282551e819853952284681e9ed60398c5c4a/examples/megatron_11b/README.md

Open source status

https://developer.nvidia.com/blog/language-modeling-using-megatron-a100-gpu/

@stas00 @patrickvonplaten

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions