Cache parameterless multi-qubit decompositions in Unroll3qOrMore #10703

mtreinish · 2023-08-24T03:51:54Z

Summary

This commit adds a cache for parameter free gate decompositions in
the Unroll3qOrMore transpiler pass. If there are repeated gates in a
circuit that don't have any parameters, then when we decompose the
circuit a single time that decomposition will be valid for all instances
of that gate. This doesn't hold true for parameterized gates because the
parameter value will determine the decomposition used. This greatly
improves the runtime performance of the pass when there are repeated >=3
qubit gates in the circuit.

In a future revision we can investigate adding the parameter values to
the cache lookup. But the hit rate would likely be fairly low in such
cases because it would only provide runtime improvements if there are
repeated >3 qubit gates with the same exact parameter values, which
seems fairly rare in practice.

Details and comments

~~This is based on top of #10702 and will need to be rebased after #10702 merges~~ This has been rebased

qiskit-bot · 2023-08-24T03:51:57Z

One or more of the the following people are requested to review this:

@Qiskit/terra-core

coveralls · 2023-08-24T04:28:37Z

Pull Request Test Coverage Report for Build 5976660918

12 of 14 (85.71%) changed or added relevant lines in 1 file are covered.
9 unchanged lines in 3 files lost coverage.
Overall coverage decreased (-0.003%) to 87.247%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
qiskit/transpiler/passes/basis/unroll_3q_or_more.py	12	14	85.71%

Files with Coverage Reduction	New Missed Lines	%
crates/qasm2/src/expr.rs	1	93.76%
crates/qasm2/src/lex.rs	2	90.63%
crates/qasm2/src/parse.rs	6	96.2%

Totals
Change from base Build 5976504024:	-0.003%
Covered Lines:	74269
Relevant Lines:	85125

💛 - Coveralls

This commit adds a cache for parameter free gate decompositions in the Unroll3qOrMore transpiler pass. If there are repeated gates in a circuit that don't have any parameters, then when we decompose the circuit a single time that decomposition will be valid for all instances of that gate. This doesn't hold true for parameterized gates because the parameter value will determine the decomposition used. This greatly improves the runtime performance of the pass when there are repeated >=3 qubit gates in the circuit. In a future revision we can investigate adding the parameter values to the cache lookup. But the hit rate would likely be fairly low in such cases because it would only provide runtime improvements if there are repeated >3 qubit gates with the same exact parameter values, which seems fairly rare in practice. This is based on top of Qiskit#10702 and will need to be rebased after Qiskit#10702 merges.

Cryoris · 2023-08-28T08:53:23Z

qiskit/transpiler/passes/basis/unroll_3q_or_more.py

+            params = node.op.params
+            name = node.op.name
+            if not params:
+                cached_decomp = self._cache.get(name, None)


What about gates that have non-unique names, such as Isometry or PauliEvolutionGate? 🤔

Sigh, I always forget about those gates. But I think this should still be sound because they all have parameters set so this won't cache them. But if you can think of an example this isn't sound then I have another idea to get this result (move to the basis translator use that instead).

I think the biggest risk is with things like QFT that aren't parametrised, but are variadic over the qubit count.

Well I guess there are two paths here then, either we can add a qargs len() or a tuple of qargs to the cache key and assume with a given name and qubits it's the same gate. Or alternatively we can abandon this caching approach and move to the doing unroll3q to us the basis translator instead (by adding a equiv library option to this pass and a min qubits arg to the basis translator). I'm thinking the basis translator approach will be the robust one, because the use case I'm concerned with here is if someone submits a circuit with 200k toffolis we spend a huge amount of time unrolling the same toffoli over and over again.

Yeah, I feel like this should be a bit more of a unified part of the basis translator. I actually think we might want to consider what a transpiler pipeline that has basis translation before routing would look like, so that we don't need the 3+q split at all; I could imagine a Target-aware router that synthesises the swaps into something allowable immediately on output, such that basis-translation afterwards isn't necessary.

This commit modifies the the preset pass manager construction to by default use the UnrollCustomDefinitions and BasisTranslator passes to perform the unrolling of operations that operate on >= 3 qubits. This is done by leveraging the min_qubits argument added to those passes in this commit which lets you set a minimum number of qubit arguments to translate operations for. Previously, this was handled by the Unroll3qOrMore pass which works by iterating over the DAG and finding all operations with >= 3 qubits and recursively unrolling them until the output is all in terms of 1 or 2 qubit operations. This works fine, but has undesireable runtime performance characteristics because standard gates repeatedly have to build new DAGs to substitute in. For example, if you had a circuit with 50k CCXGates that would result in 50k new DAGs being created from scratch for each instance of CCXGate. A previous attempt was made to add a translation cache to that pass in Qiskit#10703 but as was discussed in code review caching the circuit is unsound as it's possible for a custom gate object to be constructed such that it would be cached and return an invalid substitution circuit. By leveraging the basis translator we only build as many substition DAGs as needed.

mtreinish · 2023-09-05T21:13:01Z

Closing this in favor of #10776

This commit modifies the the preset pass manager construction to by default use the UnrollCustomDefinitions and BasisTranslator passes to perform the unrolling of operations that operate on >= 3 qubits. This is done by leveraging the min_qubits argument added to those passes in this commit which lets you set a minimum number of qubit arguments to translate operations for. Previously, this was handled by the Unroll3qOrMore pass which works by iterating over the DAG and finding all operations with >= 3 qubits and recursively unrolling them until the output is all in terms of 1 or 2 qubit operations. This works fine, but has undesireable runtime performance characteristics because standard gates repeatedly have to build new DAGs to substitute in. For example, if you had a circuit with 50k CCXGates that would result in 50k new DAGs being created from scratch for each instance of CCXGate. A previous attempt was made to add a translation cache to that pass in #10703 but as was discussed in code review caching the circuit is unsound as it's possible for a custom gate object to be constructed such that it would be cached and return an invalid substitution circuit. By leveraging the basis translator we only build as many substition DAGs as needed.

mtreinish requested a review from a team as a code owner August 24, 2023 03:51

mtreinish added performance Changelog: None Do not include in changelog labels Aug 24, 2023

mtreinish force-pushed the cacbe-decomp branch from 485854d to 60f6167 Compare August 25, 2023 13:44

mtreinish force-pushed the cacbe-decomp branch from 60f6167 to 4d2b79c Compare August 25, 2023 13:46

mtreinish added this to the 0.45.0 milestone Aug 26, 2023

Cryoris reviewed Aug 28, 2023

View reviewed changes

mtreinish mentioned this pull request Sep 5, 2023

Use BasisTranslator for unroll 3q or more #10776

Merged

mtreinish closed this Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache parameterless multi-qubit decompositions in Unroll3qOrMore #10703

Cache parameterless multi-qubit decompositions in Unroll3qOrMore #10703

Uh oh!

mtreinish commented Aug 24, 2023 •

edited

Loading

Uh oh!

qiskit-bot commented Aug 24, 2023

Uh oh!

coveralls commented Aug 24, 2023 •

edited

Loading

Uh oh!

Cryoris Aug 28, 2023

Uh oh!

mtreinish Aug 28, 2023

Uh oh!

jakelishman Aug 30, 2023

Uh oh!

mtreinish Sep 1, 2023

Uh oh!

jakelishman Sep 5, 2023

Uh oh!

mtreinish commented Sep 5, 2023

Uh oh!

Uh oh!

Cache parameterless multi-qubit decompositions in Unroll3qOrMore #10703

Cache parameterless multi-qubit decompositions in Unroll3qOrMore #10703

Uh oh!

Conversation

mtreinish commented Aug 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details and comments

Uh oh!

qiskit-bot commented Aug 24, 2023

Uh oh!

coveralls commented Aug 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 5976660918

💛 - Coveralls

Uh oh!

Cryoris Aug 28, 2023

Choose a reason for hiding this comment

Uh oh!

mtreinish Aug 28, 2023

Choose a reason for hiding this comment

Uh oh!

jakelishman Aug 30, 2023

Choose a reason for hiding this comment

Uh oh!

mtreinish Sep 1, 2023

Choose a reason for hiding this comment

Uh oh!

jakelishman Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

mtreinish commented Sep 5, 2023

Uh oh!

Uh oh!

mtreinish commented Aug 24, 2023 •

edited

Loading

coveralls commented Aug 24, 2023 •

edited

Loading