Add support for FP8 on H100 using NVidia's TransformerEngine #1965

dskhudia · 2023-02-14T00:20:11Z

What does this PR do?

Adds amp_fp8 precision. This will allows us to train model faster using FP8 precision on H100 systems. It's a no op if amp_fp8 precision is not used.

What issue(s) does this change relate to?

FP8 training support on H100

dakinggg

Could you add some brief docs to the precision class about this new option? Also, is this something you want to add to setup.py as an optional dependency, or too soon? Lastly, is there any way to write simple tests for this?

composer/core/precision.py

mvpatel2000

Is there a target for transformer_engine in setup.py (is it already a dependency somewhere)? if not, can we added it?

dskhudia · 2023-02-14T06:00:13Z

Lastly, is there any way to write simple tests for this?

@dakinggg Since it's not really usable on current hardware I can add some negative tests where we check for the errors on current hardware. Something like this useful?

dskhudia · 2023-02-14T06:05:19Z

Is there a target for transformer_engine in setup.py

Also, is this something you want to add to setup.py as an optional dependency, or too soon?

Added a target txengine but two issues: 1) there is no pypi package for it 2) Installation depends on torch so it fails due to pip build isolation :-(

dakinggg · 2023-02-14T07:58:13Z

Yeah, a negative test that errors would be helpful for now. Just to exercise the code path. Thanks! And hm, installation depending on torch is unfortunate. In that case, could you put installation instructions somewhere? Probably in the documentation for it fp8?

composer/core/precision.py

dskhudia · 2023-02-16T20:35:17Z

@dakinggg added tests
@mvpatel2000 added installation instructions

tests/test_precision.py

…l#1965)

lukaemon · 2023-02-21T08:12:53Z

New to fp8 or bf8. Does this mean it's possible to do fp8 training with consumer level ada GPU like rtx 4090?

dskhudia · 2023-02-21T18:46:47Z

@lukaemon : Two variants of FP8 exist and there is nothing called bf8 on NVidia cards.

training with consumer level ada GPU

I think so. Provided you have CUDA 12, TransformerEngine layers in your model and using amp_fp8 precision.
NVidia announcement does point to FP8 in 4090.

Ada’s new 4th Generation Tensor Cores are unbelievably fast, with an all new 8-Bit Floating Point (FP8) Tensor Engine, increasing throughput by up to 5X, to 1.32 Tensor-petaFLOPS on the GeForce RTX 4090.

lukaemon · 2023-02-21T23:25:17Z

@dskhudia Thanks for clarification.

vgoklani · 2023-03-16T03:52:41Z

@lukaemon @dskhudia No!!!

NVIDIA/TransformerEngine#15

lukaemon · 2023-03-16T05:05:06Z

@vgoklani Thanks. Ada fp8 support after 23q2 at least.

thomasw21 · 2023-03-16T08:06:49Z

composer/trainer/dist_strategy.py

+    elif dtype in ['amp_fp8']:
+        # We use torch.bfloat16 by default for amp_fp8 as there is no
+        # fp8 datatype in PyTorch yet.
+        return torch.bfloat16


Hi! New to your repo! Fp8 integration sounds super nice! I'm just trying to understand what this line implies? Does it mean that everything is running in bf16 when we specify fp8?

Nvm I think I misunderstood how the fp8 system worked I was somehow expecting to play with a specific dtype. Not the context manager. My bad

Glad that your confusion is resolved.

float-trip · 2023-08-09T15:07:35Z

TransformerEngine has support for Ada GPUs now. Can the restriction in this PR be loosened to torch.cuda.get_device_capability() >= (8, 9)?

I've tested fp8 on a 4090 with llm-foundry and training starts successfully.

dskhudia force-pushed the amp_bf8 branch 2 times, most recently from d2917a5 to 5c04109 Compare February 14, 2023 00:32

dskhudia requested review from mvpatel2000, dakinggg and bandish-shah February 14, 2023 00:32

dakinggg reviewed Feb 14, 2023

View reviewed changes

composer/core/precision.py Outdated Show resolved Hide resolved

composer/core/precision.py Outdated Show resolved Hide resolved

composer/core/precision.py Show resolved Hide resolved

mvpatel2000 reviewed Feb 14, 2023

View reviewed changes

dskhudia force-pushed the amp_bf8 branch from 5c04109 to f6d98b9 Compare February 14, 2023 06:03

dskhudia requested a review from a team as a code owner February 14, 2023 06:03

mvpatel2000 reviewed Feb 14, 2023

View reviewed changes

composer/core/precision.py Outdated Show resolved Hide resolved

dskhudia force-pushed the amp_bf8 branch 2 times, most recently from 099c050 to dd11bbd Compare February 16, 2023 20:34

mvpatel2000 approved these changes Feb 16, 2023

View reviewed changes

tests/test_precision.py Outdated Show resolved Hide resolved

dakinggg approved these changes Feb 16, 2023

View reviewed changes

dskhudia force-pushed the amp_bf8 branch from dd11bbd to b657b30 Compare February 16, 2023 22:28

Add support for FP8 on H100 using NVidia's TransformerEngine

09504e4

dskhudia force-pushed the amp_bf8 branch from b657b30 to 09504e4 Compare February 16, 2023 22:30

dskhudia merged commit c4ce366 into mosaicml:dev Feb 16, 2023

dskhudia deleted the amp_bf8 branch February 16, 2023 22:51

dakinggg pushed a commit to dakinggg/composer that referenced this pull request Feb 17, 2023

Add support for FP8 on H100 using NVidia's TransformerEngine (mosaicm…

b800acf

…l#1965)

thomasw21 reviewed Mar 16, 2023

View reviewed changes

Add support for FP8 on H100 using NVidia's TransformerEngine #1965

Add support for FP8 on H100 using NVidia's TransformerEngine #1965

Uh oh!

Conversation

dskhudia commented Feb 14, 2023

What does this PR do?

What issue(s) does this change relate to?

Uh oh!

dakinggg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mvpatel2000 left a comment

Choose a reason for hiding this comment

Uh oh!

dskhudia commented Feb 14, 2023

Uh oh!

dskhudia commented Feb 14, 2023

Uh oh!

dakinggg commented Feb 14, 2023

Uh oh!

Uh oh!

dskhudia commented Feb 16, 2023

Uh oh!

Uh oh!

lukaemon commented Feb 21, 2023

Uh oh!

dskhudia commented Feb 21, 2023

Uh oh!

lukaemon commented Feb 21, 2023

Uh oh!

vgoklani commented Mar 16, 2023

Uh oh!

lukaemon commented Mar 16, 2023

Uh oh!

thomasw21 Mar 16, 2023

Choose a reason for hiding this comment

Uh oh!

thomasw21 Mar 16, 2023

Choose a reason for hiding this comment

Uh oh!

dskhudia Mar 16, 2023

Choose a reason for hiding this comment

Uh oh!

float-trip commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

float-trip commented Aug 9, 2023 •

edited

Loading