Port standard instructions to Rust. #13486

kevinhartman · 2024-11-25T05:21:32Z

Summary

Adds a new Rust enum StandardInstruction representation for standard instructions (i.e. Barrier, Delay, Measure and Reset) and updates the encoding of PackedOperation to support storing it compactly at rest.

The size of a PackedOperation has now been anchored to always be 64 bits ~~, even on 32-bit systems~~ EDIT: we no longer support 32-bit anyway. This is necessary to encode the data payload of standard instructions like Barrier and Delay. See the docstrings within packed_instruction.rs for details.

Details and comments

The implementation works very similarly to what we currently do for StandardGate, but with a bit of extra consideration for an additional data payload used by variadic/template instructions like Barrier and Delay.

Similarly to StandardGate, the newly added StandardInstruction enum serves as the first-class Rust interface for working with standard instructions. The existing OperationRef enum returned by PackedOperation::view has a new variant StandardInstruction which exposes this.

Unlike StandardGate, StandardInstruction is not a pyclass because it is not directly used to tag instructions as standard in our Python class definitions. Instead, the simple integral enum StandardInstructionType is used as the tag, and OperationFromPython::extract_bound further queries the incoming Python object for variadic/template data when building the complete StandardInstruction.

Encoding

The PackedOperation encoding has been updated as follows:

A PackedOperation is now always 64 bits wide
The discriminant has been widened from 2 to 3 bits (it is now at its maximum width, but we still have room for 3 more variants).
The discriminant now has additional variant StandardInstruction.
A packed standard gate's enum is now byte-aligned.

This way, we can also use this enum to tag the Python classes.

jakelishman · 2024-11-25T07:42:36Z

I'm concerned that the LoHi struct and PackedOperation union are making it easier to make mistakes on BE and/or 32-bit systems, because the encodings now change between them.

I feel like the bitshift and masking operations could be extended over the whole u64, and that'll all automatically work on BE and 32-bit systems, especially with the size of PackedOperation now fixed to 64 bits - we can even have everything be aligned. The shifts of the payload of instructions can be set such that a u32 payload is always in the high bits, with the padding bits between it and the rest of the discriminants if you're concerned about aligned access, because then the bitshifts will be compiled out into a single register load (just read the u32 from the right place) and the compiler will handle endianness for us. The pointer doesn't need to be handled any differently to any other payload - everything's just shifts and masks anyway (LoHi can't avoid that), and I think introducing the union makes that harder to follow.

The LoHi struct to me seems to be forcing every method of PackedOperation to think about the partial register reads/loads. If we use shifts and masks on a u64 with const masks and shifts, there's no logic split where some of it is done with shifts and some with endianness-dependent partial register access in our own code.

jakelishman · 2024-11-25T09:03:08Z

I guess my main point is this: the 64-bit PackedOperation can always be thought about as a collection of payloads each of which is stored in a particular mask and needs a particular shift. These are:

the PackedOperation discriminant at (0b111, 0)
a pointer at (usize::MAX as u64 & 0b000, 0) (regardless of 32-bit or 64-bit)
a StandardGate discriminant at (0xff << 3, 3)
a StandardInstruction discriminant at (0xf << 3, 3)
a DelayUnit payload at (for example) (0x7 << 32, 32)
a u32 payload at (u32::MAX as u64 << 32, 32)

Introducing LoHi doesn't remove the need to bitshift and mask for most items, it just means that some of my above list are done one way, some are done another, and the union means that the pointer now has more ways to access it. LoHi also restricts the payload size to u32, when we already have payloads that exceed that (the pointers).

If you want a shade more encapsulation than my loose const shift and mask associated items, would something like this look better to you?

struct PayloadLocation<T: SomeBytemuckOrCustomTrait> {
    mask: u64,
    shift: u32,
    datatype: PhantomData<T>,
}
impl PayloadLocation {
    // Note we _mustn't_ accept `PackedOperation` until we've got a valid one,
    // because holding a `PackedOperation` implies ownership over its stored
    // pointer, and it mustn't `Drop` or attempt to interpret partial data.

    #[inline]
    fn store(&self, src: T, target: u64) -> u64 {
        let src: u64 = bytemuck::cast(src);
        target | ((src << self.shift) & self.mask)
    }
    #[inline]
    fn load(&self, target: u64) -> T {
        bytemuck::cast((target & self.mask) >> self.shift)
    }
}

const PACKED_OPERATION_DISCRIMINANT = PayloadLocation { mask: 0b111, shift: 0 };
const STANDARD_GATE_DISCRIMINANT = PayloadLocation { mask: 0bff << 3, shift: 3 };
const POINTER_PAYLOAD = PayloadLocation { mask: usize::MAX as u64 & 0b000, shift: 0 };
// and so on

(Bear in mind I just typed that raw without trying it, and I didn't think all that hard about what the trait bound should be.)

If everything's inlined and constant, the compiler absolutely should be able to compile out 32 bit shifts and all-ones masks into partial register reads/loads itself, so there oughtn't to be any overhead.

This reverts commit b6d8f92.

…hs." This reverts commit 6049498.

Trying out a neat crate for Rust bitfields. The caveats are: * Custom enums used in the bitfield must specify const funcs for bit conversion, and bytemuck's cast functions aren't const. * The bitfield crate implements Clone for you, but does not seem to have a way to disable this. We can't rely on their clone, since for pointer types we need to allocate a new Box. To get around this, PackedOperation is a wrapper around an internal bitfield struct (rather than being a bitfield struct itself). (Note: I'm not yet happy with this. Specifically, I think the abstraction may be cleaner if OpBitField is defined entirely in terms of raw native types and any reinterpretation / transmutation is done from PackedOperation. Consider this a first pass.)

jakelishman

Thanks for this - this latest implementation looks very nice and clean to me. I have a couple of questions about the private modules inside packed_instruction, but they're relatively minor. In general for the implementation I'm happy with this and would like to get it merged sooner rather than later.

It would be really good to get a benchmark comparison for this before we merge, to check there's no expected problems.

jakelishman · 2025-02-03T13:19:47Z

crates/circuit/src/operations.rs

+unsafe impl ::bytemuck::CheckedBitPattern for StandardInstructionType {
+    type Bits = u8;
+
+    fn is_valid_bit_pattern(bits: &Self::Bits) -> bool {
+        *bits < 4
+    }
+}


Any chance you've thought of a nicer way to do this? I know we've got a few instances of this kind of check in the library by now, and I'm getting more nervous about making mistakes in them, especially since we already did it once).

No need to change this PR about it (and if we do have any ideas, let's do it in a follow-up anyway), just asking if anything's occurred to you?

The only thing that has loosely occurred to me here (other than using nightly features) is to define these enum types with some kind of recursive macro that generates each variant's index and implements CheckedBitPattern after visiting all members. It feels like that would be possible, but I'm not well-versed in Rust macros yet, so I could be wrong.

yeah no worries - I also had no better ideas, I'm just really hoping someone comes up with something workable haha.

We could write a proc macro to do it, but I don't know what effect that'll have on our build times. I suppose we're already in that boat via PyO3?

crates/circuit/src/dag_circuit.rs

crates/circuit/src/operations.rs

jakelishman · 2025-02-03T13:32:42Z

qiskit/circuit/measure.py

+    def __init_subclass__(cls, **kwargs):
+        super().__init_subclass__(**kwargs)
+        # Subclasses of Measure are not "standard", so we set this to None to
+        # prevent the Rust code from treating them as such.
+        cls._standard_instruction_type = None
+


I think we're going to have to revisit this in the future to more properly support specialised subclasses of standard instructions - we're almost certain to miss things in the compiler if we're encoding two different things that are actually measures in two very different forms (StandardInstruction and Instruction). That said, we already have that problem with the subclassing, so I don't think this isn't going to cause any new problems.

I mostly just mean that I think the need to do this is indicative that we don't have the story properly formalised yet, which was true before anyway.

jakelishman · 2025-02-03T13:40:39Z

crates/circuit/src/packed_instruction.rs

+/// A private module to encapsulate the encoding of [StandardGate].
+mod standard_gate {


Minor question, but what encapsulation do you mean here? The trait implementations aren't private, because visibility is tied to the trait and the types involved (and everything here is public), and the structs aren't public because they're not exported anyway.

I'm fine if it's just for logical grouping within the file, I'm just confused by the comment.

I can remove the comment since it's likely self-documenting anyway and therefore confusing to readers. I really only meant that StandardInstructionBits and our reliance on bitfield_struct is private to the inline standard_gate module. I was trying to emphasize this to show that the new implementation shouldn't lock us in to using an external crate, if it becomes inconvenient.

crates/circuit/src/packed_instruction.rs

jakelishman · 2025-02-03T14:15:56Z

Ah, I see that #13759 is actually built on top of this (I hadn't looked at it properly yet), so my concerns about the Drop impl aren't really realised in the same way.

Putting the Drop impl inside mod pointer {} still feels wrong to me, and I'd rather we didn't do it, but given that the way UnitaryGate is implemented in Rust doesn't actually have a problem with it, it's not as critical to change.

Also uses codegen for From<Box<T>> explicitly for all T, rather than a blanket impl on the private PackablePointer trait to ensure proper docs are generated.

kevinhartman · 2025-02-04T22:31:17Z

Thanks for the review!

Putting the Drop impl inside mod pointer {} still feels wrong to me, and I'd rather we didn't do it, but given that the way UnitaryGate is implemented in Rust doesn't actually have a problem with it, it's not as critical to change.

As mentioned in a few of my recent comment replies, this should be resolved now 🙂.

Separately, I'll run some benchmarks for this against main, and then against a copy that reverts the layout change to a packed StandardGate. I sort of doubt the compiled output will be any different, tbh, but you're right we should at least make sure it isn't worse.

jakelishman

Ah nice nice nice, yeah, that last change to the Drop is nice - pointer-y stuff can stay in the pointer code group, but Drop itself isn't hidden.

Thanks for all this - I think it's good just to verify with the benchmarks, but assuming no regressions, I'm happy to merge this now. I'll tick once we've got the benchmarks.

Probably not necessary in most of these cases.

kevinhartman · 2025-02-05T16:26:14Z

There were a few benchmarks that got worse, so I switched us back to using the same padding for StandardGate for the sake of isolation, even though I don't believe the pad difference is responsible. These benchmarks compare the merge-base of main and this branch, before I added a few more #[inline] directives (these ended up doing nothing for the smaller sample of tests I ran locally, and I'd rather not run all the circuit construction and pass tests again).

Here's what improved:

Change	Before [`8095ace`] <standard-op^2>	After [`c85be7a`] <standard-op~1>	Ratio	Benchmark (Parameter)
-	44.6±3μs	39.0±1μs	0.87	circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 128)
-	56.1±5μs	45.2±2μs	0.81	circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 128)
-	40.3±1ms	35.9±0.7ms	0.89	passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(5, 1024, 1)
-	736±20ms	662±7ms	0.9	passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['u', 'cx', 'id'])
-	37.5±0.7μs	34.0±0.7μs	0.91	passes.PassBenchmarks.time_remove_diagonal_gates_before_measurement(5, 1024)

And here's what got worse:

Change	Before [`8095ace`] <standard-op^2>	After [`c85be7a`] <standard-op~1>	Ratio	Benchmark (Parameter)
+	40.7±2ms	47.2±2ms	1.16	mapping_passes.PassBenchmarks.time_apply_layout(14, 1024)
+	52.1±2μs	61.7±1μs	1.18	passes.PassBenchmarks.time_remove_barriers(14, 1024)
+	64.1±0.6μs	75.1±3μs	1.17	passes.PassBenchmarks.time_remove_barriers(20, 1024)
+	32.7±0.2μs	36.8±0.4μs	1.12	passes.PassBenchmarks.time_remove_barriers(5, 1024)
+	1.55±0.03ms	1.80±0.05ms	1.16	passes.PassBenchmarks.time_remove_reset_in_zero_state(14, 1024)
+	634±10μs	713±70μs	1.12	passes.PassBenchmarks.time_remove_reset_in_zero_state(5, 1024)
+	420±20μs	531±7μs	1.27	passes.PassBenchmarks.time_unroll_3q_or_more(5, 1024)

I believe the passes that have gotten worse are all Python-based and do things like iterate over the DAG (which requires Python Barrier, Delay, Reset, and Measure to be re-inflated). So I don't think there's anything to be concerned about here, though I'm happy to dig further if anyone feels differently.

jakelishman · 2025-02-05T16:31:53Z

Yeah I'd agree - all those passes that got worse look to be because we've just shifted the Python-space construction cost into the passes themselves. The total time is probably the same, just some cost moved from setup to benchmark. I also highly doubt it's anything to do with StandardGate changes, but since it's not proven safe, it's good to move it, and we can just do that move in a separate PR if you still want it, where it'd be easier to revert if the overnight benchmarking shows a regression.

jakelishman

Thanks for all the work on this!

A long-standing bug in qpy leads to the units of delay instructions being overwritten to `dt`. See: Qiskit/qiskit#10662 Some experiments test that their circuits are round-trip serializable through qpy (perhaps not as necessary as it once was as the room for sticking arbitrary Python into circuits is reduced with the move to Rust). The T1 and Tphi circuits have delays and so were affected by the qpy bug. However, there is a second bug related to delays which canceled this bug out for those tests. `Delay.__eq__` does not check that the units are the same for the two delay instructions, so the loss of the original unit did not matter for the tests. Following this change: Qiskit/qiskit#13486 Comparing circuits with delays now includes the delay units in the determination of equality. `Delay.__eq__` itself still does not consider the unit but some part of adding the delay to a circuit converts it into a Rust object that does check the unit. Here the tests are given a backend with a `dt` value so that the experiments will generate circuits with delays in `dt` and so still have delays with the same `dt` after the qpy roundtrip.

…#1509) A long-standing bug in qpy leads to the units of delay instructions being overwritten to `dt`. See: Qiskit/qiskit#10662 Some experiments test that their circuits are round-trip serializable through qpy (perhaps not as necessary as it once was as the room for sticking arbitrary Python into circuits is reduced with the move to Rust). The T1 and Tphi circuits have delays and so were affected by the qpy bug. However, there is a second bug related to delays which canceled this bug out for those tests. `Delay.__eq__` does not check that the units are the same for the two delay instructions, so the loss of the original unit did not matter for the tests. Following this change: Qiskit/qiskit#13486 Comparing circuits with delays now includes the delay units in the determination of equality. `Delay.__eq__` itself still does not consider the unit but some part of adding the delay to a circuit converts it into a Rust object that does check the unit. Here the tests are given a backend with a `dt` value so that the experiments will generate circuits with delays in `dt` and so still have delays with the same `dt` after the qpy roundtrip.

…qiskit-community#1509) A long-standing bug in qpy leads to the units of delay instructions being overwritten to `dt`. See: Qiskit/qiskit#10662 Some experiments test that their circuits are round-trip serializable through qpy (perhaps not as necessary as it once was as the room for sticking arbitrary Python into circuits is reduced with the move to Rust). The T1 and Tphi circuits have delays and so were affected by the qpy bug. However, there is a second bug related to delays which canceled this bug out for those tests. `Delay.__eq__` does not check that the units are the same for the two delay instructions, so the loss of the original unit did not matter for the tests. Following this change: Qiskit/qiskit#13486 Comparing circuits with delays now includes the delay units in the determination of equality. `Delay.__eq__` itself still does not consider the unit but some part of adding the delay to a circuit converts it into a Rust object that does check the unit. Here the tests are given a backend with a `dt` value so that the experiments will generate circuits with delays in `dt` and so still have delays with the same `dt` after the qpy roundtrip.

…#1509) A long-standing bug in qpy leads to the units of delay instructions being overwritten to `dt`. See: Qiskit/qiskit#10662 Some experiments test that their circuits are round-trip serializable through qpy (perhaps not as necessary as it once was as the room for sticking arbitrary Python into circuits is reduced with the move to Rust). The T1 and Tphi circuits have delays and so were affected by the qpy bug. However, there is a second bug related to delays which canceled this bug out for those tests. `Delay.__eq__` does not check that the units are the same for the two delay instructions, so the loss of the original unit did not matter for the tests. Following this change: Qiskit/qiskit#13486 Comparing circuits with delays now includes the delay units in the determination of equality. `Delay.__eq__` itself still does not consider the unit but some part of adding the delay to a circuit converts it into a Rust object that does check the unit. Here the tests are given a backend with a `dt` value so that the experiments will generate circuits with delays in `dt` and so still have delays with the same `dt` after the qpy roundtrip. (cherry picked from commit ad2341d)

… (backport #1509) (#1520) A long-standing bug in qpy leads to the units of delay instructions being overwritten to `dt`. See: Qiskit/qiskit#10662 Some experiments test that their circuits are round-trip serializable through qpy (perhaps not as necessary as it once was as the room for sticking arbitrary Python into circuits is reduced with the move to Rust). The T1 and Tphi circuits have delays and so were affected by the qpy bug. However, there is a second bug related to delays which canceled this bug out for those tests. `Delay.__eq__` does not check that the units are the same for the two delay instructions, so the loss of the original unit did not matter for the tests. Following this change: Qiskit/qiskit#13486 Comparing circuits with delays now includes the delay units in the determination of equality. `Delay.__eq__` itself still does not consider the unit but some part of adding the delay to a circuit converts it into a Rust object that does check the unit. Here the tests are given a backend with a `dt` value so that the experiments will generate circuits with delays in `dt` and so still have delays with the same `dt` after the qpy roundtrip.<hr>This is an automatic backport of pull request #1509 done by [Mergify](https://mergify.com). Co-authored-by: Will Shanks <willshanks@us.ibm.com>

kevinhartman added 17 commits November 7, 2024 18:06

Extend PackedOperation for standard instructions.

1115e27

WIP

d90fb30

Compiling.

7f260db

Add missing module registration.

07c1226

Implement Operation::directive

e14c42a

Add StandardInstructionType enum.

2c6888d

Expose only StandardInstructionType to Python.

69a80cc

This way, we can also use this enum to tag the Python classes.

Fix unit access.

0e6634b

Remove no longer needed enum variant parens.

06ede59

Remove unused stuff.

a157d47

Fix up packed_instruction docs, add From<StandardInstruction>.

32a83fc

Fix up imports.

0a50a2f

Improve comments in circuit_instruction.

794f684

Fix typo.

8470ac7

More doc fixes.

d2b5465

Reorganize PackedInstruction mem layout to support 32 bit archs.

6049498

Update layout description, cleanup.

b6d8f92

kevinhartman added 3 commits December 4, 2024 14:45

Revert "Update layout description, cleanup."

c0e1f10

This reverts commit b6d8f92.

Revert "Reorganize PackedInstruction mem layout to support 32 bit arc…

8b031c2

…hs." This reverts commit 6049498.

1ucian0 added the Rust This PR or issue is related to Rust code in the repository label Dec 11, 2024

kevinhartman added 7 commits December 13, 2024 18:00

Pivot.

6dfd4e3

Compiling.

f30a98f

Remove unused bytemuck stuff.

a0d7730

Run format.

39c1eff

Improve static assertion error messages.

f24dbff

Clean up.

93293c3

Implement delay unit for ImmediateValue.

f520d63

jakelishman reviewed Feb 3, 2025

View reviewed changes

kevinhartman added 2 commits February 4, 2025 15:07

Make impl_packable_pointer operation_type an expr.

1c89bff

Move PackedOperation Drop to root mod.

e89cf47

Also uses codegen for From<Box<T>> explicitly for all T, rather than a blanket impl on the private PackablePointer trait to ensure proper docs are generated.

jakelishman reviewed Feb 4, 2025

View reviewed changes

jakelishman assigned kevinhartman and unassigned kevinhartman Feb 4, 2025

kevinhartman added 3 commits February 4, 2025 23:58

Remove StandardGate pad.

c85be7a

Merge branch 'main' of github.com:Qiskit/qiskit into standard-op

a608362

Add #[inline] in a few places.

3604cfe

Probably not necessary in most of these cases.

jakelishman approved these changes Feb 5, 2025

View reviewed changes

jakelishman enabled auto-merge February 5, 2025 16:32

jakelishman added this pull request to the merge queue Feb 5, 2025

Merged via the queue into Qiskit:main with commit 73409fb Feb 5, 2025
17 checks passed

This was referenced Feb 8, 2025

Fix tests failing against Qiskit main due to change in delay behavior qiskit-community/qiskit-experiments#1509

Merged

Delay.__eq__ does not compare the delay unit #13812

Closed

mergify bot mentioned this pull request Feb 25, 2025

Fix tests failing against Qiskit main due to change in delay behavior (backport #1509) qiskit-community/qiskit-experiments#1520

Merged

ElePT added Changelog: New Feature Include in the "Added" section of the changelog Changelog: None Do not include in changelog and removed Changelog: New Feature Include in the "Added" section of the changelog labels Mar 31, 2025

		/// A private module to encapsulate the encoding of [StandardGate].
		mod standard_gate {

Port standard instructions to Rust. #13486

Port standard instructions to Rust. #13486

Uh oh!

Conversation

kevinhartman commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details and comments

Encoding

Uh oh!

jakelishman commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakelishman commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakelishman left a comment

Choose a reason for hiding this comment

Uh oh!

jakelishman Feb 3, 2025

Choose a reason for hiding this comment

Uh oh!

kevinhartman Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

jakelishman Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jakelishman Feb 3, 2025

Choose a reason for hiding this comment

Uh oh!

jakelishman Feb 3, 2025

Choose a reason for hiding this comment

Uh oh!

kevinhartman Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jakelishman commented Feb 3, 2025

Uh oh!

kevinhartman commented Feb 4, 2025

Uh oh!

jakelishman left a comment

Choose a reason for hiding this comment

Uh oh!

kevinhartman commented Feb 5, 2025

Uh oh!

jakelishman commented Feb 5, 2025

Uh oh!

jakelishman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kevinhartman commented Nov 25, 2024 •

edited

Loading

jakelishman commented Nov 25, 2024 •

edited

Loading

jakelishman commented Nov 25, 2024 •

edited

Loading