python3Packages.triton: 3.3.1 -> 3.4.0 #432046

GaetanLepage · 2025-08-08T16:52:04Z

Things done

Diff: triton-lang/triton@v3.3.1...v3.4.0

cc @SomeoneSerge @Madouura @DerDennisOP

Add a 👍 reaction to pull requests you find important.

stephen-huan

Diff lgtm, traveling right now so hard to test. Let me know if it doesn't build and I can take a look.

stephen-huan · 2025-08-09T20:16:44Z

pkgs/by-name/tr/triton-llvm/package.nix

@@ -65,7 +65,7 @@ let
 in
 stdenv.mkDerivation (finalAttrs: {
  pname = "triton-llvm";
-  version = "21.0.0-git"; # See https://github.com/llvm/llvm-project/blob/main/cmake/Modules/LLVMVersion.cmake
+  version = "21.0.0-unstable-2025-06-10"; # See https://github.com/llvm/llvm-project/blob/main/cmake/Modules/LLVMVersion.cmake


Nice catch, makes sense to me (I tunnel visioned too much on the llvm tag and forgot about nixpkgs's conventions).

stephen-huan · 2025-08-09T20:38:48Z

and does #431973 depend on this PR? In my experience newer versions of triton are often not backwards compatible with older versions so triton will have to be bumped simultaneously with torch, or is this not true?

Diff: triton-lang/triton@v3.3.1...v3.4.0

GaetanLepage · 2025-08-19T20:24:12Z

After 4+ hours of compilation, I can confirm that python3Packages.torch builds with cudaSupport = true. I'm not sure we can feasibly test much more than that.

GaetanLepage · 2025-08-19T20:27:14Z

`nixpkgs-review` result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 432046 --package python3Packages.torchWithCuda
Commit: 78a46c0051144802961ac5bbbb4804a19fa513ee

`x86_64-linux`

✅ 5 packages built:

python3Packages.torchWithCuda
python3Packages.torchWithCuda.cxxdev (python3Packages.torchWithCuda.cxxdev.cxxdev, python3Packages.torchWithCuda.cxxdev.dev, python3Packages.torchWithCuda.cxxdev.dist, python3Packages.torchWithCuda.cxxdev.lib)
python3Packages.torchWithCuda.dev (python3Packages.torchWithCuda.dev.cxxdev, python3Packages.torchWithCuda.dev.dev, python3Packages.torchWithCuda.dev.dist, python3Packages.torchWithCuda.dev.lib)
python3Packages.torchWithCuda.dist (python3Packages.torchWithCuda.dist.cxxdev, python3Packages.torchWithCuda.dist.dev, python3Packages.torchWithCuda.dist.dist, python3Packages.torchWithCuda.dist.lib)
python3Packages.torchWithCuda.lib (python3Packages.torchWithCuda.lib.cxxdev, python3Packages.torchWithCuda.lib.dev, python3Packages.torchWithCuda.lib.dist, python3Packages.torchWithCuda.lib.lib)

GaetanLepage · 2025-08-19T21:14:42Z

`nixpkgs-review` result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 432046 --package python3Packages.triton
Commit: 78a46c0051144802961ac5bbbb4804a19fa513ee

`x86_64-linux`

✅ 2 packages built:

python3Packages.triton
python3Packages.triton.dist (python3Packages.triton.dist.dist)

`aarch64-linux`

✅ 2 packages built:

python3Packages.triton
python3Packages.triton.dist (python3Packages.triton.dist.dist)

`x86_64-darwin`

⏩ 2 packages marked as broken and skipped:

python3Packages.triton
python3Packages.triton.dist

`aarch64-darwin`

⏩ 2 packages marked as broken and skipped:

python3Packages.triton
python3Packages.triton.dist

GaetanLepage · 2025-08-21T08:01:28Z

@kirillrdy this one should be ready to go too :)

kirillrdy

tried building but gave up waiting

LunNova · 2025-08-25T04:32:33Z

This breaks torch.compile triton (maybe only on ROCm)!

LoweringException: AttributeError: type object 'CompiledKernel' has no attribute 'launch_enter_hook'

Reportedly triton 3.4 needs torch 2.8+

Full error I'm hitting on a tiny modded-nanogpt train which occurs when it tries to autotune flex_attn after applying this upgrade:

[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/kernel/flex_attention.py", line 1565, in flex_attention
[rank5]:     autotune_select_algorithm(
[rank5]:     ~~~~~~~~~~~~~~~~~~~~~~~~~^
[rank5]:         "flex_attention",
[rank5]:         ^^^^^^^^^^^^^^^^^
[rank5]:     ...<3 lines>...
[rank5]:         input_gen_fns=input_gen_fns,
[rank5]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:     ),
[rank5]:     ^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 2350, in autotune_select_algorithm
[rank5]:     return _ALGORITHM_SELECTOR_CACHE(*args, **kwargs)
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1985, in __call__
[rank5]:     timings = do_autotuning(precompile_fn)
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1913, in do_autotuning
[rank5]:     timings = self.lookup(
[rank5]:         choices,
[rank5]:     ...<2 lines>...
[rank5]:         autotune,
[rank5]:     )
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/codecache.py", line 321, in lookup
[rank5]:     timings = benchmark(choices)
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1893, in autotune
[rank5]:     return make_benchmark_fn()(choices)
[rank5]:            ~~~~~~~~~~~~~~~~~~~^^^^^^^^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 2084, in benchmark_in_current_process
[rank5]:     choice.precompile()
[rank5]:     ~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1357, in precompile
[rank5]:     self.bmreq.precompile()
[rank5]:     ~~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/autotune_process.py", line 745, in precompile
[rank5]:     getattr(mod, self.kernel_name).precompile()
[rank5]:     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 277, in precompile
[rank5]:     self._make_launchers()
[rank5]:     ~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 434, in _make_launchers
[rank5]:     launchers.append(result.make_launcher())
[rank5]:                      ~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 1153, in make_launcher
[rank5]:     "launch_enter_hook": binary.__class__.launch_enter_hook,
[rank5]:                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: torch._inductor.exc.InductorError: LoweringException: AttributeError: type object 'CompiledKernel' has no attribute 'launch_enter_hook'
[rank5]:   target: flex_attention
[rank5]:   args[0]: TensorBox(StorageBox(
[rank5]:     ComputedBuffer(name='buf13', layout=FixedLayout('cuda:0', torch.bfloat16, size=[1, 32, 6144, 64], stride=[12582912, 393216, 64, 1]), data=Pointwise(device=device(type='cuda', index=0), dtype=torch.bfloat16, inner_fn=<function BaseView.make_loader.<locals>.loader at 0x7ffd6b2ca200>, ranges=[1, 32, 6144, 64]))
[rank5]:   ))
[rank5]:   args[1]: TensorBox(StorageBox(
[rank5]:     ComputedBuffer(name='buf14', layout=FixedLayout('cuda:0', torch.bfloat16, size=[1, 32, 6144, 64], stride=[12582912, 393216, 64, 1]), data=Pointwise(device=device(type='cuda', index=0), dtype=torch.bfloat16, inner_fn=<function BaseView.make_loader.<locals>.loader at 0x7ffd6b2c85e0>, ranges=[1, 32, 6144, 64]))
[rank5]:   ))
[rank5]:   args[2]: TensorBox(StorageBox(
[rank5]:     ComputedBuffer(name='buf15', layout=FixedLayout('cuda:0', torch.bfloat16, size=[1, 32, 6144, 64], stride=[12582912, 393216, 64, 1]), data=Pointwise(device=device(type='cuda', index=0), dtype=torch.bfloat16, inner_fn=<function ReinterpretView.make_loader.<locals>.loader at 0x7ffd6b2d4a40>, ranges=[1, 32, 6144, 64]))
[rank5]:   ))
[rank5]:   args[3]: Subgraph(name='sdpa_score0', graph_module=<lambda>(), graph=None)
[rank5]:   args[4]: (6144, 6144, TensorBox(StorageBox(
[rank5]:     InputBuffer(name='primals_7', layout=FixedLayout('cuda:0', torch.int32, size=[1, 1, 48], stride=[48, 48, 1]))
[rank5]:   )), TensorBox(StorageBox(
[rank5]:     InputBuffer(name='primals_6', layout=FixedLayout('cuda:0', torch.int32, size=[1, 1, 48, 48], stride=[2304, 2304, 48, 1]))

GaetanLepage · 2025-08-25T06:47:11Z

This breaks torch.compile triton (maybe only on ROCm)!

LoweringException: AttributeError: type object 'CompiledKernel' has no attribute 'launch_enter_hook'

Reportedly triton 3.4 needs torch 2.8+

Full error I'm hitting on a tiny modded-nanogpt train which occurs when it tries to autotune flex_attn after applying this upgrade:

[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/kernel/flex_attention.py", line 1565, in flex_attention
[rank5]:     autotune_select_algorithm(
[rank5]:     ~~~~~~~~~~~~~~~~~~~~~~~~~^
[rank5]:         "flex_attention",
[rank5]:         ^^^^^^^^^^^^^^^^^
[rank5]:     ...<3 lines>...
[rank5]:         input_gen_fns=input_gen_fns,
[rank5]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:     ),
[rank5]:     ^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 2350, in autotune_select_algorithm
[rank5]:     return _ALGORITHM_SELECTOR_CACHE(*args, **kwargs)
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1985, in __call__
[rank5]:     timings = do_autotuning(precompile_fn)
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1913, in do_autotuning
[rank5]:     timings = self.lookup(
[rank5]:         choices,
[rank5]:     ...<2 lines>...
[rank5]:         autotune,
[rank5]:     )
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/codecache.py", line 321, in lookup
[rank5]:     timings = benchmark(choices)
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1893, in autotune
[rank5]:     return make_benchmark_fn()(choices)
[rank5]:            ~~~~~~~~~~~~~~~~~~~^^^^^^^^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 2084, in benchmark_in_current_process
[rank5]:     choice.precompile()
[rank5]:     ~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1357, in precompile
[rank5]:     self.bmreq.precompile()
[rank5]:     ~~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/autotune_process.py", line 745, in precompile
[rank5]:     getattr(mod, self.kernel_name).precompile()
[rank5]:     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 277, in precompile
[rank5]:     self._make_launchers()
[rank5]:     ~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 434, in _make_launchers
[rank5]:     launchers.append(result.make_launcher())
[rank5]:                      ~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 1153, in make_launcher
[rank5]:     "launch_enter_hook": binary.__class__.launch_enter_hook,
[rank5]:                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: torch._inductor.exc.InductorError: LoweringException: AttributeError: type object 'CompiledKernel' has no attribute 'launch_enter_hook'
[rank5]:   target: flex_attention
[rank5]:   args[0]: TensorBox(StorageBox(
[rank5]:     ComputedBuffer(name='buf13', layout=FixedLayout('cuda:0', torch.bfloat16, size=[1, 32, 6144, 64], stride=[12582912, 393216, 64, 1]), data=Pointwise(device=device(type='cuda', index=0), dtype=torch.bfloat16, inner_fn=<function BaseView.make_loader.<locals>.loader at 0x7ffd6b2ca200>, ranges=[1, 32, 6144, 64]))
[rank5]:   ))
[rank5]:   args[1]: TensorBox(StorageBox(
[rank5]:     ComputedBuffer(name='buf14', layout=FixedLayout('cuda:0', torch.bfloat16, size=[1, 32, 6144, 64], stride=[12582912, 393216, 64, 1]), data=Pointwise(device=device(type='cuda', index=0), dtype=torch.bfloat16, inner_fn=<function BaseView.make_loader.<locals>.loader at 0x7ffd6b2c85e0>, ranges=[1, 32, 6144, 64]))
[rank5]:   ))
[rank5]:   args[2]: TensorBox(StorageBox(
[rank5]:     ComputedBuffer(name='buf15', layout=FixedLayout('cuda:0', torch.bfloat16, size=[1, 32, 6144, 64], stride=[12582912, 393216, 64, 1]), data=Pointwise(device=device(type='cuda', index=0), dtype=torch.bfloat16, inner_fn=<function ReinterpretView.make_loader.<locals>.loader at 0x7ffd6b2d4a40>, ranges=[1, 32, 6144, 64]))
[rank5]:   ))
[rank5]:   args[3]: Subgraph(name='sdpa_score0', graph_module=<lambda>(), graph=None)
[rank5]:   args[4]: (6144, 6144, TensorBox(StorageBox(
[rank5]:     InputBuffer(name='primals_7', layout=FixedLayout('cuda:0', torch.int32, size=[1, 1, 48], stride=[48, 48, 1]))
[rank5]:   )), TensorBox(StorageBox(
[rank5]:     InputBuffer(name='primals_6', layout=FixedLayout('cuda:0', torch.int32, size=[1, 1, 48, 48], stride=[2304, 2304, 48, 1]))

Thanks for reporting. I'm working on packaging torch 2.8.0, but I am not done yet:
#431973

LunNova · 2025-08-26T00:41:30Z

Will see if we can backport compat for triton 2.4 in the short term since the 2.8 bump looks a bit complicated.

#436960

GaetanLepage requested a review from stephen-huan August 8, 2025 16:55

GaetanLepage force-pushed the update/python3Packages.triton branch from 66452e4 to e044761 Compare August 8, 2025 16:57

nix-owners bot requested review from SomeoneSerge, Madouura and DerDennisOP August 8, 2025 17:03

GaetanLepage force-pushed the update/python3Packages.triton branch from e044761 to a9b8b2c Compare August 8, 2025 17:23

stephen-huan approved these changes Aug 9, 2025

View reviewed changes

nixpkgs-ci bot added the 12.approvals: 1 This PR was reviewed and approved by one person. label Aug 9, 2025

GaetanLepage force-pushed the update/python3Packages.triton branch 3 times, most recently from 552d693 to 477d9f5 Compare August 18, 2025 08:24

GaetanLepage added 2 commits August 19, 2025 14:42

triton-llvm: 21.0-git -> 21.0.0-unstable-2025-06-10

ce750b1

python3Packages.triton: 3.3.1 -> 3.4.0

78a46c0

Diff: triton-lang/triton@v3.3.1...v3.4.0

GaetanLepage force-pushed the update/python3Packages.triton branch from 477d9f5 to 78a46c0 Compare August 19, 2025 13:00

GaetanLepage requested a review from kirillrdy August 19, 2025 20:44

GaetanLepage requested a review from stephen-huan August 19, 2025 22:41

kirillrdy approved these changes Aug 21, 2025

View reviewed changes

GaetanLepage merged commit 1a3d391 into NixOS:master Aug 21, 2025
31 of 33 checks passed

GaetanLepage deleted the update/python3Packages.triton branch August 21, 2025 08:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

python3Packages.triton: 3.3.1 -> 3.4.0 #432046

python3Packages.triton: 3.3.1 -> 3.4.0 #432046

Uh oh!

GaetanLepage commented Aug 8, 2025 •

edited

Loading

Uh oh!

stephen-huan left a comment

Uh oh!

stephen-huan Aug 9, 2025 •

edited

Loading

Uh oh!

stephen-huan commented Aug 9, 2025

Uh oh!

GaetanLepage commented Aug 19, 2025

Uh oh!

GaetanLepage commented Aug 19, 2025

Uh oh!

GaetanLepage commented Aug 19, 2025

Uh oh!

GaetanLepage commented Aug 21, 2025

Uh oh!

kirillrdy left a comment

Uh oh!

Uh oh!

LunNova commented Aug 25, 2025 •

edited

Loading

Uh oh!

GaetanLepage commented Aug 25, 2025

Uh oh!

LunNova commented Aug 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

python3Packages.triton: 3.3.1 -> 3.4.0 #432046

python3Packages.triton: 3.3.1 -> 3.4.0 #432046

Uh oh!

Conversation

GaetanLepage commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Things done

Uh oh!

stephen-huan left a comment

Choose a reason for hiding this comment

Uh oh!

stephen-huan Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephen-huan commented Aug 9, 2025

Uh oh!

GaetanLepage commented Aug 19, 2025

Uh oh!

GaetanLepage commented Aug 19, 2025

nixpkgs-review result

x86_64-linux

Uh oh!

GaetanLepage commented Aug 19, 2025

nixpkgs-review result

x86_64-linux

aarch64-linux

x86_64-darwin

aarch64-darwin

Uh oh!

GaetanLepage commented Aug 21, 2025

Uh oh!

kirillrdy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LunNova commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaetanLepage commented Aug 25, 2025

Uh oh!

LunNova commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

GaetanLepage commented Aug 8, 2025 •

edited

Loading

stephen-huan Aug 9, 2025 •

edited

Loading

`nixpkgs-review` result

`x86_64-linux`

`nixpkgs-review` result

`x86_64-linux`

`aarch64-linux`

`x86_64-darwin`

`aarch64-darwin`

LunNova commented Aug 25, 2025 •

edited

Loading

LunNova commented Aug 26, 2025 •

edited

Loading