Skip to content

Conversation

fabianschuiki
Copy link
Contributor

@fabianschuiki fabianschuiki commented Apr 11, 2023

Add a new FirMemOp to the Seq dialect to capture the semantics of FIRRTL memories at the HW level. The intent behind this change is the same as for FirRegOp, which captures the specifics of FIRRTL registers but within the type system and abstraction level of the HW dialect.

Memories are represented as a declaration with the seq.firmem which returns a value representing the memory itself. This value is of type !seq.firmem<12 x 42, mask 2> which represents an opaque handle to the memory; it encodes the depth (12) and width (42) of the memory, plus an optional mask if the memory is to support masked writes.

Memory ports are represented as separate operations:

  • seq.firmem.read_port
  • seq.firmem.write_port
  • seq.firmem.read_write_port

These operations take the memory itself as their first operand. The remaining operands are the specific signals required for the corresponding type of port: clocks, enables, masks, mode select, etc. Read and read-write ports also return the read data as their result. Enables and masks are optional to allow for more concise IR for simple memory configurations.

Representing ports as separate ops in the IR has the advantage that adding and removing ports is trivial; the standard CSE and canonicalization even allow for collapsing duplicate ports and removing permanently disabled ports. Separate ops also make building up memories in the IR a lot simpler, since the operands and results associated with a port are grouped together, instead of having to spread them across an overall operand and result list on the overall memory. Getting a memory's ports means walking the list of users of the declaration op, which is very cheap since the only users are the port operations.

This commit merely adds the FirMemOp alongside some canonicalization and general tests, but the op isn't used by any part of CIRCT yet. Use in FIRRTL-to-HW lowering follows as a separate commit.

After some discussion: the intent is for seq.firmem and seq.hlmem to be unified into a seq.mem once the memory handling has been factored out of the FIRRTL-to-HW lowering.

@fabianschuiki fabianschuiki added FIRRTL Involving the `firrtl` dialect Seq Involving the `seq` dialect labels Apr 11, 2023
@teqdruid
Copy link
Contributor

Based entirely on the description, this sounds very, very similar to HLMem. Is there a reason you can't extend and use that one?

@fabianschuiki
Copy link
Contributor Author

@teqdruid Yeah definitely, the two are similar in design. The idea behind seq.firmem is to do the same thing as seq.firreg: have an op that exactly represents the semantics needed by the FIRRTL dialect. If we gather a bunch of memory and register ops that describe what different dialects need, we'll can start introducing op interfaces like RegisterLike and MemoryLike to abstract over the commonalities between them, and maybe even replace them with one unified op. I like the idea of starting out with separate ops and then slowly converging as we see concrete examples, because I don't really know how the one register/memory to rule them all will look like -- and if we even want such an op instead of multiple smaller operations. The dialect-dedicated ops in Seq allows other dialects to lower to something at the HW level and gives us time to figure out how to converge on the Seq dialect op front.

@teqdruid
Copy link
Contributor

I don't think we use HLMem for anything yet. (@mortbopet Am I correct?) If I'm correct, I think it's reasonable to start with one. Anything you're going to need for your memories, we're likely to need also. It seems silly to have two implementations with one being a superset of the other. Frankly, if y'all put in a bunch of work on this one we're more likely to use it than HLMem so we'd end up with one anyway.

@sequencer
Copy link
Contributor

Basically I think there are additional data which need to be added to this Op(maybe add to spec firstly?) see chipsalliance/chisel#3131.
If adding this op, how do you prepare to use it? I personally prefer using intmodule to directly instantiate it and lower it to behavior model or blackbox to replace.

Base automatically changed from fschuiki/ruw-wuw-seq-enums to main April 12, 2023 05:03
@mortbopet
Copy link
Contributor

mortbopet commented Apr 12, 2023

Currently, HandshakeToHW is the only user of HLMem. It doesn't require any "special" semantics, and currently just relies on the behavioral lowering.

My personal opinion on this is that we should as much as possible try to work towards sharing code as soon as possible, instead of relying on things coming together down the road. While there is an argument for "ironing things out in the FIRRTL flow and letting that influence a better, improved design in the core/generic/shared ops", I still think the ecosystem is better off sharing abstractions sooner, rather than later... not to mention the confusion of people coming into CIRCT seeing fir... ops sprinkled all over the place in the core abstractions. The whole point of the hlmem-family of operations is to allow operation reuse while adorning attributes to guide lowering - thus i'd imagine the FIRRTL flow to add dialect/lowering specific attributes which would then be lowered by a FIRRTL-specific pass which catches said attributes.

That is, at least from my point of view, the whole point of the hlmem family of ops. Therefore, if the current definition of the op family doesn't capture the concept needed to represent FIRRTL memory lowering, then we should change those ops, instead of just adding more, dialect-specific ops, and increasing ecosystem fragmentation while doing so.

@seldridge
Copy link
Member

hlmem would have to change a lot in order to make this work. E.g., the current approach of an hlmem having one clock and one reset doesn't really make sense here. Memories can have one clock per port and no reset.

It would likely be very disruptive to try to move hlmem in this direction right now and I'm not sure if existing users (@mortbopet ?) would be amenable to this.

Put differently, the only unification of hlmem with firmem right now would be to replace the former with the latter given that the latter has so much more information. Given the prior discussions around other seq dialect operations it seemed expedient to let multiple approaches develop independently first.

@fabianschuiki fabianschuiki force-pushed the fschuiki/seq-firmem branch 2 times, most recently from fb7dcbe to 072ebcc Compare April 12, 2023 18:31
@fabianschuiki
Copy link
Contributor Author

Thanks for the great discussion in the CIRCT ODM today 👍 🥳

@teqdruid @mortbopet I have a stack of PRs built on this that deals with carving memory lowering out from the FIRRTL pipeline. It would be easier for me to unify the firmem and hlmem ops after those FIRRTL-side changes are in. Is it okay with you to land the PRs as they are first and then do the merging into seq.mem ontop of those?

@fabianschuiki
Copy link
Contributor Author

@sequencer It would be really nice if we could make this work with chipsalliance/chisel#3131. Anything missing for that to work we could add to seq.mem and then expose that as an intrinsic on the FIRRTL dialect side, with a direct lowering to seq.mem (which I think would pretty much be the firrtl.mem op if I'm not mistaken 🤔 )

@fabianschuiki fabianschuiki requested a review from uenoku April 12, 2023 20:22
@teqdruid
Copy link
Contributor

Is it okay with you to land the PRs as they are first and then do the merging into seq.mem ontop of those?

I've no objection... so long as you get around to merging ;)

Comment on lines +250 to +285
I32Attr:$readLatency,
I32Attr:$writeLatency,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be the attribute to port?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed, you'd probably want to have them on the port. However, that's not how the firrtl.mem op and Chisel works right now. This seq.firmem tries to capture the current behavior; but I think we should definitely have a discussion and start adding new memory ops that are a better or alternative representation of memories! That'll inform what a generic Seq memory would look like -- if that even exists.

Comment on lines +252 to +287
RUWAttr:$ruw,
WUWAttr:$wuw,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RUW, WUW is the hazard definition between each port, so it should be a 1 to 1 matrix for ports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Same as above: totally agreed, we should get some improved mem op discussion going beyond just extracting firrtl.mem behavior.

OptionalAttr<FirMemInitAttr>:$init,
OptionalAttr<StrAttr>:$prefix,
OptionalAttr<AnyAttr>:$output_file
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for better PnR implementation experience, should we add optional attribute of hard macro name, and then expose them to repl-seq-mem flow?

Additionally, I think we should add extra IO for MBIST and DFT flow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for simulation, we should add a config to optionally tie down the extra IO since there are two types of simulations for memory:

  • the trivial behavior simulation, which the RTL designer only knows the memory size and ports.
  • post-RTL simulation should be able to introduce DFT and MBIST in Chisel designs that we should replace these memory with the behavior memory IP from vendor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could imagine that we add dedicated seq.mem.mbist_port and seq.mem.dft_port ops on the seq.mem op we are trying to distill out of seq.firmem later on. But it's not totally clear yo to me how to handle these, since the MBIST/DFT stuff is usually very technology node dependent. Maybe some more generic seq.mem.opaque_port with arbitrary operands/results?

I'm starting to think that we should start collecting lots of different memory ops in Seq, covering all the different use cases we have. And then either the commonality becomes very obvious, or the need for an op interface and multiple dedicated ops is the way forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DFT may be very technology node specific, but can it be treated as an opaque type/bus which is routable, even if it's not decomposable? Presumably there are generic DFT network building nodes too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that could work. That way the DFT specifics could be "implementation-defined", or defined by a dedicated DFT pass that runs later. And the memory could just allow this as some opaque port.

let parameters = (ins
"uint32_t":$depth,
"uint32_t":$width,
OptionalParameter<"llvm::Optional<uint32_t>">:$maskWidth
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should mask also be an attribute to port?

@sequencer
Copy link
Contributor

Inside Chisel, what I think we should do is get rid of the Mem and SyncMem type in the Chisel type system. Using intmodule directly represent the memory as an instance, so that we can simply go to the PnR flow and also provide a behavior model for simulation.

@fabianschuiki
Copy link
Contributor Author

@sequencer You raise a lot of very valid points. Right now seq.firmem tries to 1:1 replace the hw.module.generated flavor of memories that come out of the FIRRTL LowerToHW pass. I'm wondering whether what you are describing would warrant something like a seq.macro_mem operation, with a much more rigid port list and precise listing of MBIST/DFT ports.

My feeling is that it'll be almost impossible to find a general memory op that unifies all concerns from HLS, pipeline scheduling, FIRRTL, and ASIC node-specific macros. So instead, why not create a lot of different ops that are very use-case specific, but then have a MemoryLike op interface to abstract over them and allow passes to reason about them.

@sequencer
Copy link
Contributor

I have been wondering how to migrate to a more PnR friendly API, here is my thought:

  • firstly, follow the this FirMem to create the Chisel frontend(and deprecate it when adding it)
  • propose a spec change to firrtl, switch to a more reasonable with breaking change?
  • switch the intmodule to use the new API.

Add a new `FirMemOp` to the Seq dialect to capture the semantics of
FIRRTL memories at the HW level. The intent behind this change is the
same as for `FirRegOp`, which captures the specifics of FIRRTL registers
but within the type system and abstraction level of the HW dialect.

Memories are represented as a declaration with the `seq.firmem` which
returns a value representing the memory itself. This value is of type
`!seq.firmem<12 x 42, mask 2>` which represents an opaque handle to the
memory; it encodes the depth (`12`) and width (`42`) of the memory, plus
an optional mask if the memory is to support masked writes.

Memory ports are represented as separate operations:

- `seq.firmem.read_port`
- `seq.firmem.write_port`
- `seq.firmem.read_write_port`

These operations take the memory itself as their first operand. The
remaining operands are the specific signals required for the
corresponding type of port: clocks, enables, masks, mode select, etc.
Read and read-write ports also return the read data as their result.
Enables and masks are optional to allow for more concise IR for simple
memory configurations.

Representing ports as separate ops in the IR has the advantage that
adding and removing ports is trivial; the standard CSE and
canonicalization even allow for collapsing duplicate ports and removing
permanently disabled ports. Separate ops also make building up memories
in the IR a lot simpler, since the operands and results associated with
a port are grouped together, instead of having to spread them across an
overall operand and result list on the overall memory. Getting a
memory's ports means walking the list of users of the declaration op,
which is very cheap since the only users are the port operations.

This commit merely adds the `FirMemOp` alongside some canonicalization
and general tests, but the op isn't used by any part of CIRCT yet. Use
in FIRRTL-to-HW lowering follows as a separate commit.
Copy link
Contributor

@darthscsi darthscsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have no doubt there will be iteration on this, but let's get it in and start using it.

OptionalAttr<FirMemInitAttr>:$init,
OptionalAttr<StrAttr>:$prefix,
OptionalAttr<AnyAttr>:$output_file
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DFT may be very technology node specific, but can it be treated as an opaque type/bus which is routable, even if it's not decomposable? Presumably there are generic DFT network building nodes too.

std::max(1U, cast<FirMemType>($_self).getWidth()))
}]>;

def FirMemReadOp : SeqOp<"firmem.read_port", [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still unsure if make-a-port ops are better than the mem producing the ports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I've seen in the Arc dialect which uses this pattern, passes get a lot easier to write if you don't need to do this weird split of multiple ports forming logical groups of operands and results, and then co-iterating the operands and results. Gets especially weird if some ports are optional. With a separate op it's easier to split the port structure into more manageable pieces, and we get port DCE for free.

@fabianschuiki fabianschuiki merged commit 956b581 into main Jun 22, 2023
@fabianschuiki fabianschuiki deleted the fschuiki/seq-firmem branch June 22, 2023 16:19
isConst(op.getClock()) || isConstZero(op.getMode())) {
auto newOp = rewriter.replaceOpWithNewOp<FirMemReadOp>(
op, op.getMemory(), op.getAddress(), op.getClock(), op.getEnable());
for (auto namedAttr : op->getAttrs())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a crash here when running the canonicalization tests locally. Shouldn't you not be accessing op after replacing it (not sure why it passes in CI)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened pr to address this here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! For some reason this didn't show up in my local run 😞 Need to check my setup. Thanks for catching and fixing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FIRRTL Involving the `firrtl` dialect Seq Involving the `seq` dialect
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants