Improve 2q block collection via 1q quaternion-based collection #13649

jakelishman · 2025-01-10T13:47:38Z

Summary

This is a small series of patches, which could be split into 2-3 separate PRs if preferred. There are two main goals:

introduce a quaternion-based mechanism for working with the $U(2)$ group (1q gates) matrix-free (uses 5 floats, rather than 8 + nd-matrix overhead).
improve the collection speed of ConsolidateBlocks

This doesn't do everything that could be done for ConsolidateBlocks, but I've stopped at the point where the most natural changes to me now are quite a bit larger.

Using this script:

from qiskit import transpile
from qiskit.converters import circuit_to_dag
from qiskit.circuit import library as lib
from qiskit.transpiler.passes import ConsolidateBlocks

pass_ = ConsolidateBlocks(basis_gates=["rz", "sx", "ecr"], force_consolidate=True)

# Some arbitary circuit with lots of runs that use both 1q runs and 2q runs
# with gates in both directions (transpile will put them all in one direction).
qc_base = transpile(
    lib.quantum_volume(100, 100, seed=0),
    basis_gates=["rz", "sx", "ecr"],
    seed_transpiler=0,
)
qc = qc_base.copy_empty_like()
flip = False
for inst in qc_base.data:
    if len(inst.qubits) == 2:
        if flip:
            inst = inst.replace(qubits=inst.qubits[::-1])
        flip = not flip
    qc._append(inst)

dags = [circuit_to_dag(qc, copy_operations=False) for _ in [None]*100]
%time for dag in dags: pass_.run(dag)

I see a modest (~10%) improvement in the pass time, going from ~117ms to ~106ms.

Details and comments

Individual commit notes:

Add versor-based representation of 1q gates

1q gates are members of the group U(2), which we can represent as a scalar phase term and a member of SU(2). The members of SU(2) can be represented by versors (also called unit quaternions, but I got tired of typing that all the time...).

This adds a representation of versors and the group action to the Rust code, and ways to convert from matrix-based forms to the them.

This commit introduces nalgebra as a dependency, to use its quaternion logic. This is a relatively heavy dependency, especially for something as simple as quaternions, but some of this is in anticipation of moving more matrix code to the static matrices of nalgebra, rather than the too-dynamic-for-our-needs ones of ndarray; faer also offers static matrices, but its APIs continue to heavily fluctuate between versions, and it requires ever higher MSRVs.

Use quaternions in 1q block collection

Switch the inner algorithm of ConsolidateBlocks to use the quaternion form for single-qubit matrix multiplications. This offered a few percentage-points speedup for the block collection for typical rz-sx-rz-sx-rz-type runs in quantum-volume-like collections.

Avoid unnecessary allocations in qargs lookup

Switch the lookup logic for the qargs in the block collector to determine the qargs ordering without additional heap allocations. This offers another modest (~4%) improvement in collection performance for large runs.

Avoid allocations in simple matrix operations

Producing the Kronecker product of the two single-qubit matrices from the versor representation is trivially calculable, and can be written into an existing allocation. Similarly, switching the qubit order of a 2q matrix involves only six swaps, and does not need to allocate a new matrix if one is already available.

There are lots of places remaining in this code where more matrix allocations could be avoided. If nothing else, it should be possible to allocate only three 2q matrices in total, and keep shuffling the labelling of them when doing A.B -> C. ndarray does not make this easy, though; nalgebra and faer both have better interfaces for doing this, but currently all our matrix code in the Operation trait is in terms of ndarray.

1q gates are members of the group U(2), which we can represent as a scalar phase term and a member of SU(2). The members of SU(2) can be represented by versors (also called unit quaternions, but I got tired of typing that all the time...). This adds a representation of versors and the group action to the Rust code, and ways to convert from matrix-based forms to the them. This commit introduces `nalgebra` as a dependency, to use its quaternion logic. This is a relatively heavy dependency, especially for something as simple as quaternions, but some of this is in anticipation of moving more matrix code to the static matrices of `nalgebra`, rather than the too-dynamic-for-our-needs ones of `ndarray`; `faer` also offers static matrices, but its APIs continue to heavily fluctuate between versions, and it requires ever higher MSRVs.

Switch the inner algorithm of `ConsolidateBlocks` to use the quaternion form for single-qubit matrix multiplications. This offered a few percentage-points speedup for the block collection for typical `rz-sx-rz-sx-rz`-type runs in quantum-volume-like collections.

Switch the lookup logic for the qargs in the block collector to determine the qargs ordering without additional heap allocations. This offers another modest (~4%) improvement in collection performance for large runs.

Producing the Kronecker product of the two single-qubit matrices from the versor representation is trivially calculable, and can be written into an existing allocation. Similarly, switching the qubit order of a 2q matrix involves only six swaps, and does not need to allocate a new matrix if one is already available. There are lots of places remaining in this code where more matrix allocations could be avoided. If nothing else, it should be possible to allocate only three 2q matrices in total, and keep shuffling the labelling of them when doing `A.B -> C`. `ndarray` does not make this easy, though; `nalgebra` and `faer` both have better interfaces for doing this, but currently all our matrix code in the `Operation` trait is in terms of `ndarray`.

qiskit-bot · 2025-01-10T13:47:44Z

One or more of the following people are relevant to this code:

@Qiskit/terra-core

jakelishman · 2025-01-10T13:48:53Z

There's also more places we could make use of the versor representation, like in 1q gate optimisation, but that pass currently involves passing matrices through many different parts of its interface with itself, so it'll be more complicated to modify.

... which were necessary because I'd borked the matrix calculations and forgotten to write any tests of them.

mtreinish

Just some quick high level comments. I missed the update and don't want these lost by github weirdness. I'll review in more depth later.

crates/accelerate/src/qi/mod.rs

mtreinish · 2025-01-10T17:39:37Z

crates/accelerate/src/qi/mod.rs

+// copyright notice, and modified files need to carry a notice indicating
+// that they have been altered from the originals.
+
+//! Quantum-information and linear-algebra related functionality, typically used as drivers for


We might want to move over some linear algebra functionality like: https://github.com/Qiskit/qiskit/blob/main/crates/accelerate/src/utils.rs and https://github.com/Qiskit/qiskit/blob/main/crates/accelerate/src/synthesis/linear/utils.rs (although that first one might not be needed anymore).

Yeah, I'm happy in a follow-up to move a few other bits over. I think there's other loose files and bits and bobs that could probably move into it too, just to keep things a bit more localised.

crates/accelerate/src/qi/versor_gate.rs

coveralls · 2025-01-10T18:55:34Z

Pull Request Test Coverage Report for Build 14660431723

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

261 of 315 (82.86%) changed or added relevant lines in 3 files are covered.
291 unchanged lines in 10 files lost coverage.
Overall coverage increased (+0.03%) to 87.928%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/accelerate/src/convert_2q_block_matrix.rs	90	93	96.77%
crates/accelerate/src/quantum_info/versor_u2.rs	170	221	76.92%

Files with Coverage Reduction	New Missed Lines	%
crates/accelerate/src/two_qubit_decompose.rs	1	91.63%
crates/accelerate/src/euler_one_qubit_decomposer.rs	6	91.33%
crates/qasm2/src/parse.rs	6	97.61%
crates/qasm2/src/lex.rs	7	91.98%
crates/accelerate/src/consolidate_blocks.rs	11	94.72%
crates/accelerate/src/high_level_synthesis.rs	20	90.55%
crates/accelerate/src/unitary_synthesis.rs	28	94.78%
crates/accelerate/src/vf2_layout.rs	28	83.11%
crates/accelerate/src/basis/basis_translator/mod.rs	55	88.99%
crates/accelerate/src/target_transpiler/mod.rs	129	79.11%

Totals
Change from base Build 14637349697:	0.03%
Covered Lines:	74367
Relevant Lines:	84577

💛 - Coveralls

It's useful to be able to represent elements both of $SU(2)$ and $U(2)$; even the initial version of this PR used representations of $SU(2)$ with a shared phase in the only location the new types were used. This adds a `VersorSU2` struct to make this more easily possible, and splits the interface in the right places to make it possible to use both from other modules.

jakelishman · 2025-01-22T17:48:46Z

I fixed the merge conflicts, and also I pivoted the structs a little bit to expose both $SU(2)$ and $U(2)$ representations, since I was already effectively using the $SU(2)$ representation in the block collector.

This commit builds off of the native rust representation of a UnitaryGate added in Qiskit#13759 and uses the native representation everywhere we were using UnitaryGate in rust via python previously: the quantum_volume() function and consolidate blocks. One future item is consolidate blocks can be updated to use nalgebra types internally instead of ndarray as for the 1 and 2q cases we know the fixed size of the array ahead of time. However the block consolidation code is built using ndarray currently and later synthesis code also works in ndarray so there isn't any real benefit yet, and we'd just add unecessary conversions and allocations. However, once Qiskit#13649 merges this will change and it would make more sense to add the unitary gate with a Matrix4. But this can be handled separately after this merges.

This commit builds off of the native rust representation of a UnitaryGate added in Qiskit#13759 and uses the native representation everywhere we were using UnitaryGate in rust via python previously: the quantum_volume() function, consolidate blocks, split2qunitaries, and unitary synthesis. One future item is consolidate blocks can be updated to use nalgebra types internally instead of ndarray as for the 1 and 2q cases we know the fixed size of the array ahead of time. However the block consolidation code is built using ndarray currently and later synthesis code also works in ndarray so there isn't any real benefit yet, and we'd just add unecessary conversions and allocations. However, once Qiskit#13649 merges this will change and it would make more sense to add the unitary gate with a Matrix4. But this can be handled separately after this merges.

This commit builds off of the native rust representation of a UnitaryGate added in #13759 and uses the native representation everywhere we were using UnitaryGate in rust via python previously: the quantum_volume() function, consolidate blocks, split2qunitaries, and unitary synthesis. One future item is consolidate blocks can be updated to use nalgebra types internally instead of ndarray as for the 1 and 2q cases we know the fixed size of the array ahead of time. However the block consolidation code is built using ndarray currently and later synthesis code also works in ndarray so there isn't any real benefit yet, and we'd just add unecessary conversions and allocations. However, once #13649 merges this will change and it would make more sense to add the unitary gate with a Matrix4. But this can be handled separately after this merges.

mtreinish

I've only reviewed the quantum info piece so far. But it looks good to me. I'll follow up tomorrow reviewing the changes to the block converter.

crates/accelerate/src/qi/versor_u2.rs

mtreinish · 2025-02-13T23:26:44Z

crates/accelerate/src/qi/versor_u2.rs

+                Ok(VersorSU2::identity().with_phase(phase))
+            }
+            StandardGate::HGate => {
+                debug_assert!(params.is_empty());


We probably should get in the habit of doing this more for underlying assumptions like this.. Not that we build a ton in debug mode, but we do in some places so this would be good to sanity check.

crates/accelerate/src/qi/versor_u2.rs

mtreinish

Following up with the convert_2q_block_matrix.rs file this looks good. Just a couple more small comments.

crates/accelerate/src/convert_2q_block_matrix.rs

mtreinish · 2025-02-14T21:13:41Z

crates/accelerate/src/convert_2q_block_matrix.rs

+    match (
+        qubits_1q.map(|sep| sep.matrix_into(&mut work)),
+        output_matrix,
+    ) {
+        (Some(sep), Some(state)) => Ok(sep.dot(&state)),
+        (None, Some(state)) => Ok(state),
+        (Some(sep), None) => Ok(sep.to_owned()),
+        // This shouldn't actually ever trigger, because we expect blocks to be non-empty, but it's
+        // trivial to handle anyway.
+        (None, None) => Ok(arr2(&TWO_QUBIT_IDENTITY)),
+    }


Not for this PR, but it'll be great to have this all done via Matrix4 after this merges, so we can just consolidate to a UnitaryGate with ArrayType::TwoQ and avoid allocating a matrix altogether.

Yeah absolutely - I stopped in this PR at the point that I started interacting with the 2q matrices because I wasn't prepared to put too much more time into things I knew were both in flux and needed more major changes.

crates/accelerate/src/convert_2q_block_matrix.rs

jakelishman · 2025-04-24T15:43:32Z

Should be updated and comments addressed now. There's still quite a lot of inefficiency in how this PR handles 2q matrices, but I thought it's better to leave that til a follow-up and do it more completely throughout ConsolidateBlocks.

mtreinish

This LGTM now thanks for doing this, I'm looking forward to using this and also the better usage of UnitaryGate in the pass in a follow-up. I left two questions inline one isn't relevant for this PR itself but more about a separate follow-up. The other is about the code, but feel free to enqueue this to merge if you don't want to change it.

crates/accelerate/src/convert_2q_block_matrix.rs

mtreinish · 2025-04-24T21:18:30Z

crates/accelerate/src/quantum_info/versor_u2.rs

+    fn all_1q_gates() -> Vec<StandardGate> {
+        (0..STANDARD_GATE_SIZE as u8)
+            .filter_map(|x| {
+                ::bytemuck::checked::try_cast::<_, StandardGate>(x)


I actually have code somewhere that I added a impl TryFrom<u8> for StandardGate and just made a static [StandardGate; STANDARD_GATE_SIZE] array to do the conversion. It was part of some code I didn't end up finish or using for anything. But I can dig up the branch and push that as a standalone if you think it would be useful (I added it for a similar use case to this).

TryFrom<u8> for StandardGate feels wrong to me, but not super strongly. It feels nicer somehow using bytemuck to say "I know I'm doing something a little bit odd", since it feels like we're kind of conjuring semantics out of nowhere - the fact that StandardGate is backed by integers is (imo) an implementation detail, albeit a fixed one.

If we wanted to expose an iterator over all gates, I'd just expose your [StandardGate; Standard_GATE_SIZE] as a static publicly, then you can iterate through it directly. That's what I did with BitTerm in the tests of SparseObservable's lookup tables.

jakelishman added 4 commits January 10, 2025 13:47

Avoid unnecessary allocations in qargs lookup

a65684e

Switch the lookup logic for the qargs in the block collector to determine the qargs ordering without additional heap allocations. This offers another modest (~4%) improvement in collection performance for large runs.

jakelishman added performance mod: quantum info Related to the Quantum Info module (States & Operators) Changelog: None Do not include in changelog Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Jan 10, 2025

jakelishman added this to the 2.0.0 milestone Jan 10, 2025

jakelishman requested a review from a team as a code owner January 10, 2025 13:47

Add direct Rust VersorGate tests

df1a3c2

... which were necessary because I'd borked the matrix calculations and forgotten to write any tests of them.

mtreinish reviewed Jan 10, 2025

View reviewed changes

jakelishman added 4 commits January 10, 2025 22:47

Tidy up qi exports

e2e9aaa

Merge remote-tracking branch 'ibm/main' into quaternion-collection

0fcbbee

Give f64 constants consistent names

24bddd7

1ucian0 assigned mtreinish Jan 30, 2025

mtreinish mentioned this pull request Jan 30, 2025

Leverage native UnitaryGate from rust #13765

Merged

1ucian0 assigned jakelishman Feb 6, 2025

Merge remote-tracking branch 'origin/main' into quaternion-collection

019bcc6

mtreinish reviewed Feb 13, 2025

View reviewed changes

mtreinish reviewed Feb 14, 2025

View reviewed changes

mtreinish modified the milestones: 2.0.0, 2.1.0 Mar 3, 2025

jakelishman added 8 commits April 24, 2025 12:00

Merge remote-tracking branch 'ibm/main' into quaternion-collection

c9c9d3d

Fix StandardGate names

7799e43

Rename accelerate::qi to accelerate::quantum_info

d2f58d9

Centralise debug parameter assertions

c196226

Fix typo in error message

26cfcbd

Add Matrix1q implementation for nalgebra

3f20b0f

Lookup qargs directly from Interned

3bfb8f7

Natively handle UnitaryGate conversion to VersorU2

b903461

mtreinish previously approved these changes Apr 24, 2025

View reviewed changes

Trust existing unitary gates

ac469a2

jakelishman dismissed mtreinish’s stale review via ac469a2 April 25, 2025 08:34

mtreinish approved these changes Apr 25, 2025

View reviewed changes

mtreinish added this pull request to the merge queue Apr 25, 2025

Merged via the queue into Qiskit:main with commit a50d73d Apr 25, 2025
24 checks passed

jakelishman deleted the quaternion-collection branch April 25, 2025 11:15

Improve 2q block collection via 1q quaternion-based collection #13649

Improve 2q block collection via 1q quaternion-based collection #13649

Uh oh!

Conversation

jakelishman commented Jan 10, 2025

Summary

Details and comments

Add versor-based representation of 1q gates

Use quaternions in 1q block collection

Avoid unnecessary allocations in qargs lookup

Avoid allocations in simple matrix operations

Uh oh!

qiskit-bot commented Jan 10, 2025

Uh oh!

jakelishman commented Jan 10, 2025

Uh oh!

mtreinish left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mtreinish Jan 10, 2025

Choose a reason for hiding this comment

Uh oh!

jakelishman Jan 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coveralls commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 14660431723

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

jakelishman commented Jan 22, 2025

Uh oh!

mtreinish left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mtreinish Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mtreinish left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mtreinish Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

jakelishman Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jakelishman commented Apr 24, 2025

Uh oh!

mtreinish left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mtreinish Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

jakelishman Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coveralls commented Jan 10, 2025 •

edited

Loading