-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Port standard instructions to Rust. #13486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This way, we can also use this enum to tag the Python classes.
I'm concerned that the I feel like the bitshift and masking operations could be extended over the whole u64, and that'll all automatically work on BE and 32-bit systems, especially with the size of The |
I guess my main point is this: the 64-bit
Introducing If you want a shade more encapsulation than my loose struct PayloadLocation<T: SomeBytemuckOrCustomTrait> {
mask: u64,
shift: u32,
datatype: PhantomData<T>,
}
impl PayloadLocation {
// Note we _mustn't_ accept `PackedOperation` until we've got a valid one,
// because holding a `PackedOperation` implies ownership over its stored
// pointer, and it mustn't `Drop` or attempt to interpret partial data.
#[inline]
fn store(&self, src: T, target: u64) -> u64 {
let src: u64 = bytemuck::cast(src);
target | ((src << self.shift) & self.mask)
}
#[inline]
fn load(&self, target: u64) -> T {
bytemuck::cast((target & self.mask) >> self.shift)
}
}
const PACKED_OPERATION_DISCRIMINANT = PayloadLocation { mask: 0b111, shift: 0 };
const STANDARD_GATE_DISCRIMINANT = PayloadLocation { mask: 0bff << 3, shift: 3 };
const POINTER_PAYLOAD = PayloadLocation { mask: usize::MAX as u64 & 0b000, shift: 0 };
// and so on (Bear in mind I just typed that raw without trying it, and I didn't think all that hard about what the trait bound should be.) If everything's inlined and constant, the compiler absolutely should be able to compile out 32 bit shifts and all-ones masks into partial register reads/loads itself, so there oughtn't to be any overhead. |
This reverts commit b6d8f92.
…hs." This reverts commit 6049498.
Trying out a neat crate for Rust bitfields. The caveats are: * Custom enums used in the bitfield must specify const funcs for bit conversion, and bytemuck's cast functions aren't const. * The bitfield crate implements Clone for you, but does not seem to have a way to disable this. We can't rely on their clone, since for pointer types we need to allocate a new Box. To get around this, PackedOperation is a wrapper around an internal bitfield struct (rather than being a bitfield struct itself). (Note: I'm not yet happy with this. Specifically, I think the abstraction may be cleaner if OpBitField is defined entirely in terms of raw native types and any reinterpretation / transmutation is done from PackedOperation. Consider this a first pass.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this - this latest implementation looks very nice and clean to me. I have a couple of questions about the private modules inside packed_instruction
, but they're relatively minor. In general for the implementation I'm happy with this and would like to get it merged sooner rather than later.
It would be really good to get a benchmark comparison for this before we merge, to check there's no expected problems.
unsafe impl ::bytemuck::CheckedBitPattern for StandardInstructionType { | ||
type Bits = u8; | ||
|
||
fn is_valid_bit_pattern(bits: &Self::Bits) -> bool { | ||
*bits < 4 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance you've thought of a nicer way to do this? I know we've got a few instances of this kind of check in the library by now, and I'm getting more nervous about making mistakes in them, especially since we already did it once).
No need to change this PR about it (and if we do have any ideas, let's do it in a follow-up anyway), just asking if anything's occurred to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing that has loosely occurred to me here (other than using nightly features) is to define these enum types with some kind of recursive macro that generates each variant's index and implements CheckedBitPattern
after visiting all members. It feels like that would be possible, but I'm not well-versed in Rust macros yet, so I could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah no worries - I also had no better ideas, I'm just really hoping someone comes up with something workable haha.
We could write a proc macro to do it, but I don't know what effect that'll have on our build times. I suppose we're already in that boat via PyO3?
def __init_subclass__(cls, **kwargs): | ||
super().__init_subclass__(**kwargs) | ||
# Subclasses of Measure are not "standard", so we set this to None to | ||
# prevent the Rust code from treating them as such. | ||
cls._standard_instruction_type = None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're going to have to revisit this in the future to more properly support specialised subclasses of standard instructions - we're almost certain to miss things in the compiler if we're encoding two different things that are actually measures in two very different forms (StandardInstruction
and Instruction
). That said, we already have that problem with the subclassing, so I don't think this isn't going to cause any new problems.
I mostly just mean that I think the need to do this is indicative that we don't have the story properly formalised yet, which was true before anyway.
/// A private module to encapsulate the encoding of [StandardGate]. | ||
mod standard_gate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor question, but what encapsulation do you mean here? The trait implementations aren't private, because visibility is tied to the trait and the types involved (and everything here is public), and the structs aren't public because they're not exported anyway.
I'm fine if it's just for logical grouping within the file, I'm just confused by the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove the comment since it's likely self-documenting anyway and therefore confusing to readers. I really only meant that StandardInstructionBits
and our reliance on bitfield_struct
is private to the inline standard_gate
module. I was trying to emphasize this to show that the new implementation shouldn't lock us in to using an external crate, if it becomes inconvenient.
Ah, I see that #13759 is actually built on top of this (I hadn't looked at it properly yet), so my concerns about the Putting the |
Also uses codegen for From<Box<T>> explicitly for all T, rather than a blanket impl on the private PackablePointer trait to ensure proper docs are generated.
Thanks for the review!
As mentioned in a few of my recent comment replies, this should be resolved now 🙂. Separately, I'll run some benchmarks for this against |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice nice nice, yeah, that last change to the Drop
is nice - pointer-y stuff can stay in the pointer code group, but Drop
itself isn't hidden.
Thanks for all this - I think it's good just to verify with the benchmarks, but assuming no regressions, I'm happy to merge this now. I'll tick once we've got the benchmarks.
Probably not necessary in most of these cases.
There were a few benchmarks that got worse, so I switched us back to using the same padding for Here's what improved:
And here's what got worse:
I believe the passes that have gotten worse are all Python-based and do things like iterate over the DAG (which requires Python |
Yeah I'd agree - all those passes that got worse look to be because we've just shifted the Python-space construction cost into the passes themselves. The total time is probably the same, just some cost moved from setup to benchmark. I also highly doubt it's anything to do with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work on this!
A long-standing bug in qpy leads to the units of delay instructions being overwritten to `dt`. See: Qiskit/qiskit#10662 Some experiments test that their circuits are round-trip serializable through qpy (perhaps not as necessary as it once was as the room for sticking arbitrary Python into circuits is reduced with the move to Rust). The T1 and Tphi circuits have delays and so were affected by the qpy bug. However, there is a second bug related to delays which canceled this bug out for those tests. `Delay.__eq__` does not check that the units are the same for the two delay instructions, so the loss of the original unit did not matter for the tests. Following this change: Qiskit/qiskit#13486 Comparing circuits with delays now includes the delay units in the determination of equality. `Delay.__eq__` itself still does not consider the unit but some part of adding the delay to a circuit converts it into a Rust object that does check the unit. Here the tests are given a backend with a `dt` value so that the experiments will generate circuits with delays in `dt` and so still have delays with the same `dt` after the qpy roundtrip.
…#1509) A long-standing bug in qpy leads to the units of delay instructions being overwritten to `dt`. See: Qiskit/qiskit#10662 Some experiments test that their circuits are round-trip serializable through qpy (perhaps not as necessary as it once was as the room for sticking arbitrary Python into circuits is reduced with the move to Rust). The T1 and Tphi circuits have delays and so were affected by the qpy bug. However, there is a second bug related to delays which canceled this bug out for those tests. `Delay.__eq__` does not check that the units are the same for the two delay instructions, so the loss of the original unit did not matter for the tests. Following this change: Qiskit/qiskit#13486 Comparing circuits with delays now includes the delay units in the determination of equality. `Delay.__eq__` itself still does not consider the unit but some part of adding the delay to a circuit converts it into a Rust object that does check the unit. Here the tests are given a backend with a `dt` value so that the experiments will generate circuits with delays in `dt` and so still have delays with the same `dt` after the qpy roundtrip.
…qiskit-community#1509) A long-standing bug in qpy leads to the units of delay instructions being overwritten to `dt`. See: Qiskit/qiskit#10662 Some experiments test that their circuits are round-trip serializable through qpy (perhaps not as necessary as it once was as the room for sticking arbitrary Python into circuits is reduced with the move to Rust). The T1 and Tphi circuits have delays and so were affected by the qpy bug. However, there is a second bug related to delays which canceled this bug out for those tests. `Delay.__eq__` does not check that the units are the same for the two delay instructions, so the loss of the original unit did not matter for the tests. Following this change: Qiskit/qiskit#13486 Comparing circuits with delays now includes the delay units in the determination of equality. `Delay.__eq__` itself still does not consider the unit but some part of adding the delay to a circuit converts it into a Rust object that does check the unit. Here the tests are given a backend with a `dt` value so that the experiments will generate circuits with delays in `dt` and so still have delays with the same `dt` after the qpy roundtrip.
…#1509) A long-standing bug in qpy leads to the units of delay instructions being overwritten to `dt`. See: Qiskit/qiskit#10662 Some experiments test that their circuits are round-trip serializable through qpy (perhaps not as necessary as it once was as the room for sticking arbitrary Python into circuits is reduced with the move to Rust). The T1 and Tphi circuits have delays and so were affected by the qpy bug. However, there is a second bug related to delays which canceled this bug out for those tests. `Delay.__eq__` does not check that the units are the same for the two delay instructions, so the loss of the original unit did not matter for the tests. Following this change: Qiskit/qiskit#13486 Comparing circuits with delays now includes the delay units in the determination of equality. `Delay.__eq__` itself still does not consider the unit but some part of adding the delay to a circuit converts it into a Rust object that does check the unit. Here the tests are given a backend with a `dt` value so that the experiments will generate circuits with delays in `dt` and so still have delays with the same `dt` after the qpy roundtrip. (cherry picked from commit ad2341d)
…#1509) A long-standing bug in qpy leads to the units of delay instructions being overwritten to `dt`. See: Qiskit/qiskit#10662 Some experiments test that their circuits are round-trip serializable through qpy (perhaps not as necessary as it once was as the room for sticking arbitrary Python into circuits is reduced with the move to Rust). The T1 and Tphi circuits have delays and so were affected by the qpy bug. However, there is a second bug related to delays which canceled this bug out for those tests. `Delay.__eq__` does not check that the units are the same for the two delay instructions, so the loss of the original unit did not matter for the tests. Following this change: Qiskit/qiskit#13486 Comparing circuits with delays now includes the delay units in the determination of equality. `Delay.__eq__` itself still does not consider the unit but some part of adding the delay to a circuit converts it into a Rust object that does check the unit. Here the tests are given a backend with a `dt` value so that the experiments will generate circuits with delays in `dt` and so still have delays with the same `dt` after the qpy roundtrip. (cherry picked from commit ad2341d)
… (backport #1509) (#1520) A long-standing bug in qpy leads to the units of delay instructions being overwritten to `dt`. See: Qiskit/qiskit#10662 Some experiments test that their circuits are round-trip serializable through qpy (perhaps not as necessary as it once was as the room for sticking arbitrary Python into circuits is reduced with the move to Rust). The T1 and Tphi circuits have delays and so were affected by the qpy bug. However, there is a second bug related to delays which canceled this bug out for those tests. `Delay.__eq__` does not check that the units are the same for the two delay instructions, so the loss of the original unit did not matter for the tests. Following this change: Qiskit/qiskit#13486 Comparing circuits with delays now includes the delay units in the determination of equality. `Delay.__eq__` itself still does not consider the unit but some part of adding the delay to a circuit converts it into a Rust object that does check the unit. Here the tests are given a backend with a `dt` value so that the experiments will generate circuits with delays in `dt` and so still have delays with the same `dt` after the qpy roundtrip.<hr>This is an automatic backport of pull request #1509 done by [Mergify](https://mergify.com). Co-authored-by: Will Shanks <willshanks@us.ibm.com>
Summary
Adds a new Rust enum
StandardInstruction
representation for standard instructions (i.e.Barrier
,Delay
,Measure
andReset
) and updates the encoding ofPackedOperation
to support storing it compactly at rest.The size of a
PackedOperation
has now been anchored to always be 64 bits, even on 32-bit systemsEDIT: we no longer support 32-bit anyway. This is necessary to encode the data payload of standard instructions likeBarrier
andDelay
. See the docstrings withinpacked_instruction.rs
for details.Details and comments
The implementation works very similarly to what we currently do for
StandardGate
, but with a bit of extra consideration for an additional data payload used by variadic/template instructions likeBarrier
andDelay
.Similarly to
StandardGate
, the newly addedStandardInstruction
enum serves as the first-class Rust interface for working with standard instructions. The existingOperationRef
enum returned byPackedOperation::view
has a new variantStandardInstruction
which exposes this.Unlike
StandardGate
,StandardInstruction
is not apyclass
because it is not directly used to tag instructions as standard in our Python class definitions. Instead, the simple integral enumStandardInstructionType
is used as the tag, andOperationFromPython::extract_bound
further queries the incoming Python object for variadic/template data when building the completeStandardInstruction
.Encoding
The
PackedOperation
encoding has been updated as follows:PackedOperation
is now always 64 bits wideStandardInstruction
.