Skip to content

Conversation

GaetanLepage
Copy link
Contributor

@GaetanLepage GaetanLepage commented Aug 8, 2025

Things done

Diff: triton-lang/triton@v3.3.1...v3.4.0

cc @SomeoneSerge @Madouura @DerDennisOP

  • Built on platform:
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • Tested, as applicable:
  • Ran nixpkgs-review on this PR. See nixpkgs-review usage.
  • Tested basic functionality of all binary files, usually in ./result/bin/.
  • Nixpkgs Release Notes
    • Package update: when the change is major or breaking.
  • NixOS Release Notes
    • Module addition: when adding a new NixOS module.
    • Module update: when the change is significant.
  • Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other READMEs.

Add a 👍 reaction to pull requests you find important.

@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 501-1000 This PR causes many rebuilds on Linux and should normally target the staging branches. 6.topic: python Python is a high-level, general-purpose programming language. labels Aug 8, 2025
@GaetanLepage GaetanLepage force-pushed the update/python3Packages.triton branch from 66452e4 to e044761 Compare August 8, 2025 16:57
@GaetanLepage GaetanLepage force-pushed the update/python3Packages.triton branch from e044761 to a9b8b2c Compare August 8, 2025 17:23
Copy link
Member

@stephen-huan stephen-huan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff lgtm, traveling right now so hard to test. Let me know if it doesn't build and I can take a look.

@@ -65,7 +65,7 @@ let
in
stdenv.mkDerivation (finalAttrs: {
pname = "triton-llvm";
version = "21.0.0-git"; # See https://github.com/llvm/llvm-project/blob/main/cmake/Modules/LLVMVersion.cmake
version = "21.0.0-unstable-2025-06-10"; # See https://github.com/llvm/llvm-project/blob/main/cmake/Modules/LLVMVersion.cmake
Copy link
Member

@stephen-huan stephen-huan Aug 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, makes sense to me (I tunnel visioned too much on the llvm tag and forgot about nixpkgs's conventions).

@nixpkgs-ci nixpkgs-ci bot added the 12.approvals: 1 This PR was reviewed and approved by one person. label Aug 9, 2025
@stephen-huan
Copy link
Member

and does #431973 depend on this PR? In my experience newer versions of triton are often not backwards compatible with older versions so triton will have to be bumped simultaneously with torch, or is this not true?

@GaetanLepage GaetanLepage force-pushed the update/python3Packages.triton branch 3 times, most recently from 552d693 to 477d9f5 Compare August 18, 2025 08:24
@GaetanLepage GaetanLepage force-pushed the update/python3Packages.triton branch from 477d9f5 to 78a46c0 Compare August 19, 2025 13:00
@GaetanLepage
Copy link
Contributor Author

After 4+ hours of compilation, I can confirm that python3Packages.torch builds with cudaSupport = true. I'm not sure we can feasibly test much more than that.

@GaetanLepage
Copy link
Contributor Author

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 432046 --package python3Packages.torchWithCuda
Commit: 78a46c0051144802961ac5bbbb4804a19fa513ee


x86_64-linux

✅ 5 packages built:
  • python3Packages.torchWithCuda
  • python3Packages.torchWithCuda.cxxdev (python3Packages.torchWithCuda.cxxdev.cxxdev, python3Packages.torchWithCuda.cxxdev.dev, python3Packages.torchWithCuda.cxxdev.dist, python3Packages.torchWithCuda.cxxdev.lib)
  • python3Packages.torchWithCuda.dev (python3Packages.torchWithCuda.dev.cxxdev, python3Packages.torchWithCuda.dev.dev, python3Packages.torchWithCuda.dev.dist, python3Packages.torchWithCuda.dev.lib)
  • python3Packages.torchWithCuda.dist (python3Packages.torchWithCuda.dist.cxxdev, python3Packages.torchWithCuda.dist.dev, python3Packages.torchWithCuda.dist.dist, python3Packages.torchWithCuda.dist.lib)
  • python3Packages.torchWithCuda.lib (python3Packages.torchWithCuda.lib.cxxdev, python3Packages.torchWithCuda.lib.dev, python3Packages.torchWithCuda.lib.dist, python3Packages.torchWithCuda.lib.lib)

@GaetanLepage GaetanLepage requested a review from kirillrdy August 19, 2025 20:44
@GaetanLepage
Copy link
Contributor Author

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 432046 --package python3Packages.triton
Commit: 78a46c0051144802961ac5bbbb4804a19fa513ee


x86_64-linux

✅ 2 packages built:
  • python3Packages.triton
  • python3Packages.triton.dist (python3Packages.triton.dist.dist)

aarch64-linux

✅ 2 packages built:
  • python3Packages.triton
  • python3Packages.triton.dist (python3Packages.triton.dist.dist)

x86_64-darwin

⏩ 2 packages marked as broken and skipped:
  • python3Packages.triton
  • python3Packages.triton.dist

aarch64-darwin

⏩ 2 packages marked as broken and skipped:
  • python3Packages.triton
  • python3Packages.triton.dist

@GaetanLepage
Copy link
Contributor Author

@kirillrdy this one should be ready to go too :)

Copy link
Member

@kirillrdy kirillrdy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried building but gave up waiting

@GaetanLepage GaetanLepage merged commit 1a3d391 into NixOS:master Aug 21, 2025
31 of 33 checks passed
@GaetanLepage GaetanLepage deleted the update/python3Packages.triton branch August 21, 2025 08:06
@LunNova
Copy link
Member

LunNova commented Aug 25, 2025

This breaks torch.compile triton (maybe only on ROCm)!

LoweringException: AttributeError: type object 'CompiledKernel' has no attribute 'launch_enter_hook'

Reportedly triton 3.4 needs torch 2.8+

Full error I'm hitting on a tiny modded-nanogpt train which occurs when it tries to autotune flex_attn after applying this upgrade:

[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/kernel/flex_attention.py", line 1565, in flex_attention
[rank5]:     autotune_select_algorithm(
[rank5]:     ~~~~~~~~~~~~~~~~~~~~~~~~~^
[rank5]:         "flex_attention",
[rank5]:         ^^^^^^^^^^^^^^^^^
[rank5]:     ...<3 lines>...
[rank5]:         input_gen_fns=input_gen_fns,
[rank5]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:     ),
[rank5]:     ^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 2350, in autotune_select_algorithm
[rank5]:     return _ALGORITHM_SELECTOR_CACHE(*args, **kwargs)
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1985, in __call__
[rank5]:     timings = do_autotuning(precompile_fn)
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1913, in do_autotuning
[rank5]:     timings = self.lookup(
[rank5]:         choices,
[rank5]:     ...<2 lines>...
[rank5]:         autotune,
[rank5]:     )
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/codecache.py", line 321, in lookup
[rank5]:     timings = benchmark(choices)
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1893, in autotune
[rank5]:     return make_benchmark_fn()(choices)
[rank5]:            ~~~~~~~~~~~~~~~~~~~^^^^^^^^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 2084, in benchmark_in_current_process
[rank5]:     choice.precompile()
[rank5]:     ~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1357, in precompile
[rank5]:     self.bmreq.precompile()
[rank5]:     ~~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/autotune_process.py", line 745, in precompile
[rank5]:     getattr(mod, self.kernel_name).precompile()
[rank5]:     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 277, in precompile
[rank5]:     self._make_launchers()
[rank5]:     ~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 434, in _make_launchers
[rank5]:     launchers.append(result.make_launcher())
[rank5]:                      ~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 1153, in make_launcher
[rank5]:     "launch_enter_hook": binary.__class__.launch_enter_hook,
[rank5]:                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: torch._inductor.exc.InductorError: LoweringException: AttributeError: type object 'CompiledKernel' has no attribute 'launch_enter_hook'
[rank5]:   target: flex_attention
[rank5]:   args[0]: TensorBox(StorageBox(
[rank5]:     ComputedBuffer(name='buf13', layout=FixedLayout('cuda:0', torch.bfloat16, size=[1, 32, 6144, 64], stride=[12582912, 393216, 64, 1]), data=Pointwise(device=device(type='cuda', index=0), dtype=torch.bfloat16, inner_fn=<function BaseView.make_loader.<locals>.loader at 0x7ffd6b2ca200>, ranges=[1, 32, 6144, 64]))
[rank5]:   ))
[rank5]:   args[1]: TensorBox(StorageBox(
[rank5]:     ComputedBuffer(name='buf14', layout=FixedLayout('cuda:0', torch.bfloat16, size=[1, 32, 6144, 64], stride=[12582912, 393216, 64, 1]), data=Pointwise(device=device(type='cuda', index=0), dtype=torch.bfloat16, inner_fn=<function BaseView.make_loader.<locals>.loader at 0x7ffd6b2c85e0>, ranges=[1, 32, 6144, 64]))
[rank5]:   ))
[rank5]:   args[2]: TensorBox(StorageBox(
[rank5]:     ComputedBuffer(name='buf15', layout=FixedLayout('cuda:0', torch.bfloat16, size=[1, 32, 6144, 64], stride=[12582912, 393216, 64, 1]), data=Pointwise(device=device(type='cuda', index=0), dtype=torch.bfloat16, inner_fn=<function ReinterpretView.make_loader.<locals>.loader at 0x7ffd6b2d4a40>, ranges=[1, 32, 6144, 64]))
[rank5]:   ))
[rank5]:   args[3]: Subgraph(name='sdpa_score0', graph_module=<lambda>(), graph=None)
[rank5]:   args[4]: (6144, 6144, TensorBox(StorageBox(
[rank5]:     InputBuffer(name='primals_7', layout=FixedLayout('cuda:0', torch.int32, size=[1, 1, 48], stride=[48, 48, 1]))
[rank5]:   )), TensorBox(StorageBox(
[rank5]:     InputBuffer(name='primals_6', layout=FixedLayout('cuda:0', torch.int32, size=[1, 1, 48, 48], stride=[2304, 2304, 48, 1]))

@GaetanLepage
Copy link
Contributor Author

This breaks torch.compile triton (maybe only on ROCm)!

LoweringException: AttributeError: type object 'CompiledKernel' has no attribute 'launch_enter_hook'

Reportedly triton 3.4 needs torch 2.8+

Full error I'm hitting on a tiny modded-nanogpt train which occurs when it tries to autotune flex_attn after applying this upgrade:

[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/kernel/flex_attention.py", line 1565, in flex_attention
[rank5]:     autotune_select_algorithm(
[rank5]:     ~~~~~~~~~~~~~~~~~~~~~~~~~^
[rank5]:         "flex_attention",
[rank5]:         ^^^^^^^^^^^^^^^^^
[rank5]:     ...<3 lines>...
[rank5]:         input_gen_fns=input_gen_fns,
[rank5]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:     ),
[rank5]:     ^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 2350, in autotune_select_algorithm
[rank5]:     return _ALGORITHM_SELECTOR_CACHE(*args, **kwargs)
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1985, in __call__
[rank5]:     timings = do_autotuning(precompile_fn)
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1913, in do_autotuning
[rank5]:     timings = self.lookup(
[rank5]:         choices,
[rank5]:     ...<2 lines>...
[rank5]:         autotune,
[rank5]:     )
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/codecache.py", line 321, in lookup
[rank5]:     timings = benchmark(choices)
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1893, in autotune
[rank5]:     return make_benchmark_fn()(choices)
[rank5]:            ~~~~~~~~~~~~~~~~~~~^^^^^^^^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 2084, in benchmark_in_current_process
[rank5]:     choice.precompile()
[rank5]:     ~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/select_algorithm.py", line 1357, in precompile
[rank5]:     self.bmreq.precompile()
[rank5]:     ~~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/autotune_process.py", line 745, in precompile
[rank5]:     getattr(mod, self.kernel_name).precompile()
[rank5]:     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 277, in precompile
[rank5]:     self._make_launchers()
[rank5]:     ~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 434, in _make_launchers
[rank5]:     launchers.append(result.make_launcher())
[rank5]:                      ~~~~~~~~~~~~~~~~~~~~^^
[rank5]:   File "/nix/store/9kyfz4iavk7afvyi6gr1i7mf9h6ak0k1-python3.13-torch-2.7.1/lib/python3.13/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 1153, in make_launcher
[rank5]:     "launch_enter_hook": binary.__class__.launch_enter_hook,
[rank5]:                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]: torch._inductor.exc.InductorError: LoweringException: AttributeError: type object 'CompiledKernel' has no attribute 'launch_enter_hook'
[rank5]:   target: flex_attention
[rank5]:   args[0]: TensorBox(StorageBox(
[rank5]:     ComputedBuffer(name='buf13', layout=FixedLayout('cuda:0', torch.bfloat16, size=[1, 32, 6144, 64], stride=[12582912, 393216, 64, 1]), data=Pointwise(device=device(type='cuda', index=0), dtype=torch.bfloat16, inner_fn=<function BaseView.make_loader.<locals>.loader at 0x7ffd6b2ca200>, ranges=[1, 32, 6144, 64]))
[rank5]:   ))
[rank5]:   args[1]: TensorBox(StorageBox(
[rank5]:     ComputedBuffer(name='buf14', layout=FixedLayout('cuda:0', torch.bfloat16, size=[1, 32, 6144, 64], stride=[12582912, 393216, 64, 1]), data=Pointwise(device=device(type='cuda', index=0), dtype=torch.bfloat16, inner_fn=<function BaseView.make_loader.<locals>.loader at 0x7ffd6b2c85e0>, ranges=[1, 32, 6144, 64]))
[rank5]:   ))
[rank5]:   args[2]: TensorBox(StorageBox(
[rank5]:     ComputedBuffer(name='buf15', layout=FixedLayout('cuda:0', torch.bfloat16, size=[1, 32, 6144, 64], stride=[12582912, 393216, 64, 1]), data=Pointwise(device=device(type='cuda', index=0), dtype=torch.bfloat16, inner_fn=<function ReinterpretView.make_loader.<locals>.loader at 0x7ffd6b2d4a40>, ranges=[1, 32, 6144, 64]))
[rank5]:   ))
[rank5]:   args[3]: Subgraph(name='sdpa_score0', graph_module=<lambda>(), graph=None)
[rank5]:   args[4]: (6144, 6144, TensorBox(StorageBox(
[rank5]:     InputBuffer(name='primals_7', layout=FixedLayout('cuda:0', torch.int32, size=[1, 1, 48], stride=[48, 48, 1]))
[rank5]:   )), TensorBox(StorageBox(
[rank5]:     InputBuffer(name='primals_6', layout=FixedLayout('cuda:0', torch.int32, size=[1, 1, 48, 48], stride=[2304, 2304, 48, 1]))

Thanks for reporting. I'm working on packaging torch 2.8.0, but I am not done yet:
#431973

@LunNova
Copy link
Member

LunNova commented Aug 26, 2025

Will see if we can backport compat for triton 2.4 in the short term since the 2.8 bump looks a bit complicated.

#436960

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: python Python is a high-level, general-purpose programming language. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 501-1000 This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 12.approvals: 1 This PR was reviewed and approved by one person.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants