Skip to content

Conversation

fabianschuiki
Copy link
Contributor

Add an LLHD pass that unrolls loops with statically-known bounds.

The pass computes the dominance info for llhd.combinational ops, finds loop back-edges by looking for blocks that branch back to blocks that dominate them, which yields initial loop head and back-edge pairs. Based on these the pass populates a data structure with loop metadata, including the blocks in the loop body, exit control flow edges, the exit condition, induction variable, and loop bounds. After gathering this information, the loop body is simply stamped out muliple times, the few relevant control flow edges are adjusted, and a rough cleanup of unused blocks and values is performed.

The current implementation of the pass is very conservative. The loops are required to have very simple control flow, and the induction variable must go from 0 to a positive upper bound in increments of 1. We'll want to relax these constraints in the future and allow other increments/decrements, comparison predicates, and more.

Thanks again to @maerhart for doing a lot of initial work for this on his experimental branch.

Copy link
Member

@uenoku uenoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome! I'll review details Monday but I have a high-level question. Is there specific reason to unroll loops after SCF lowering? I'm wondering if we can simply unroll constant loops at SCF dialect which is substantially simpler (and there are many helpers in MLIR upstream).

@fabianschuiki
Copy link
Contributor Author

I remember @maerhart experimenting with that a little bit on his experimental branch. The problem is that Verilog has continue, break, and return which can exit early out of loops. So we can't directly lower to SCF in the frontend. The only alternative then would be to try to hoist CF into SCF to unroll the loops. AFAIK, that mainly creates scf.while loops, and loops with an early exit still wouldn't be supported 🤔. That's why I thought about trying to unroll in CF directly.

@maerhart
Copy link
Member

@uenoku We originally retained the for loops in the moore dialect, but changed it to lower directly to CF to support continue, break, return as @fabianschuiki mentioned. The alternative would be to create our own version of SCF or change the upstream SCF to support these constructs (in that case we couldn't support frontends that have goto constructs, which I think is fine). That would make unrolling a lot easier (except for while loops), but it might add additional complexity to other passes such as deseq as it would have to take all these for, break, continue, yield ops into account instead of just br, and cond_br which we would still have to support unless we also use scf for if and while. The RegionBranchOpInterface could in theory unify all the special casing but when I used it last time a 2-3 years back it didn't work very well and was a lot more complicated than regular basic block branching, but I think there have been efforts to improve that in the meantime. Also, at the LLHD level it doesn't compose very well with the current design of the wait op. Of course, nothing is set in stone and we could consider moving to a more SCF based approach in the future including a redesign of the wait op. We could also think about retaining for loops just at the moore level again by adding above mentioned features and doing unrolling before lowering to LLHD.

I had experimented with the lift-control-flow pass upstream which converts any CF region to SCF, but all the operations end up in the condition region of an scf.while which doesn't help unrolling at all. Upstream only as utilities to unroll scf.for AFAIK.

In general, I think a CF unrolling pass would be a good addition to MLIR upstream, but I agree that it is relatively complicated to implement.

@uenoku
Copy link
Member

uenoku commented Jun 30, 2025

That's fair point. Unfortunate that MLIR SCF doesn't have generic control flow yet.

@fabianschuiki fabianschuiki requested a review from uenoku July 1, 2025 15:54
Copy link
Member

@uenoku uenoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we can use mlir::CFGLoopInfo for finding out loop candidates? I think some of the logic like finding a single exist and parent/child relation should work out-of-box.

./build/bin/circt-opt foo.mlir --mlir-print-op-generic | ./build/bin/mlir-opt --allow-unregistered-dialect --pass-pipeline='builtin.module(func.func(test-cfg-loop-info))' 
Testing : "TwoNestedLoops"
Blocks : ^bb0, ^bb0, ^bb1, ^bb2, ^bb3, ^bb4, ^bb5
Loop at depth 1 containing: ^bb1<header><exiting>,^bb2,^bb4<latch>,^bb3
    Loop at depth 2 containing: ^bb2<header><exiting>,^bb3<latch>
module {
  func.func @TwoNestedLoops() -> i42 {
    %0 = "hw.constant"() <{value = 0 : i42}> : () -> i42
    %1 = "hw.constant"() <{value = 1 : i42}> : () -> i42
    %2 = "hw.constant"() <{value = 2 : i42}> : () -> i42
    %3 = "hw.constant"() <{value = 3 : i42}> : () -> i42
    %4 = "hw.constant"() <{value = 42 : i42}> : () -> i42
    cf.br ^bb1(%0, %0 : i42, i42)
  ^bb1(%5: i42, %6: i42):  // 2 preds: ^bb0, ^bb4
    %7 = "comb.icmp"(%5, %2) <{predicate = 2 : i64}> : (i42, i42) -> i1
    cf.cond_br %7, ^bb2(%0, %6 : i42, i42), ^bb5
  ^bb2(%8: i42, %9: i42):  // 2 preds: ^bb1, ^bb3
    %10 = "comb.icmp"(%8, %3) <{predicate = 2 : i64}> : (i42, i42) -> i1
    cf.cond_br %10, ^bb3, ^bb4
  ^bb3:  // pred: ^bb2
    %11 = "comb.add"(%9, %4) : (i42, i42) -> i42
    %12 = "comb.add"(%8, %1) : (i42, i42) -> i42
    cf.br ^bb2(%12, %11 : i42, i42)
  ^bb4:  // pred: ^bb2
    %13 = "comb.add"(%5, %1) : (i42, i42) -> i42
    cf.br ^bb1(%13, %9 : i42, i42)
  ^bb5:  // pred: ^bb1
    return %6 : i42
  }
}

Loop::match and Loop::unroll would be necessary to implement our own but other wise I think we can utilize LLVM LoopInfo.

@fabianschuiki
Copy link
Contributor Author

@uenoku That's a fantastic idea. Thanks for the pointer -- I wasn't aware of mlir::CFGLoopInfo. Let me try to swap that in.

@fabianschuiki fabianschuiki force-pushed the fschuiki/unroll-loops branch from 3dae8ea to f1bebd5 Compare July 2, 2025 17:55
@fabianschuiki fabianschuiki requested a review from uenoku July 2, 2025 17:55
@fabianschuiki
Copy link
Contributor Author

I have reworked the pass to use the mlir::CFGLoopInfo analysis. Works like a charm! Thanks for the pointer @uenoku 💯. We'll probably be able to relax some of the constraints in the future -- the unrolling is quite conservative right now, to simplify the first implementation.

@fabianschuiki fabianschuiki force-pushed the fschuiki/unroll-loops branch from f1bebd5 to 33e5e16 Compare July 3, 2025 18:39
@fabianschuiki fabianschuiki requested a review from uenoku July 3, 2025 18:39
Copy link
Member

@uenoku uenoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Is it intentional that you removed CFGLoop?

@fabianschuiki
Copy link
Contributor Author

Yikes no, that was an accident. Let me get that back 👍

Add an LLHD pass that unrolls loops with statically-known bounds.

The pass computes the dominance info for `llhd.combinational` ops, finds
loop back-edges by looking for blocks that branch back to blocks that
dominate them, which yields initial *loop head* and *back-edge* pairs.
Based on these the pass populates a data structure with loop metadata,
including the blocks in the loop body, exit control flow edges, the exit
condition, induction variable, and loop bounds. After gathering this
information, the loop body is simply stamped out muliple times, the few
relevant control flow edges are adjusted, and a rough cleanup of unused
blocks and values is performed.

The current implementation of the pass is *very* conservative. The loops
are required to have very simple control flow, and the induction
variable must go from 0 to a positive upper bound in increments of 1.
We'll want to relax these constraints in the future and allow other
increments/decrements, comparison predicates, and more.

Thanks again to @maerhart for doing a lot of initial work for this on
his experimental branch.

Co-authored-by: Martin Erhart <martin.erhart@sifive.com>
@fabianschuiki fabianschuiki force-pushed the fschuiki/unroll-loops branch from 33e5e16 to e1888cd Compare July 8, 2025 01:12
@fabianschuiki fabianschuiki merged commit 944a673 into main Jul 8, 2025
7 checks passed
@fabianschuiki fabianschuiki deleted the fschuiki/unroll-loops branch July 8, 2025 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants