Skip to content

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Aug 24, 2023

Summary

This commit adds a cache for parameter free gate decompositions in
the Unroll3qOrMore transpiler pass. If there are repeated gates in a
circuit that don't have any parameters, then when we decompose the
circuit a single time that decomposition will be valid for all instances
of that gate. This doesn't hold true for parameterized gates because the
parameter value will determine the decomposition used. This greatly
improves the runtime performance of the pass when there are repeated >=3
qubit gates in the circuit.

In a future revision we can investigate adding the parameter values to
the cache lookup. But the hit rate would likely be fairly low in such
cases because it would only provide runtime improvements if there are
repeated >3 qubit gates with the same exact parameter values, which
seems fairly rare in practice.

Details and comments

This is based on top of #10702 and will need to be rebased after #10702 merges This has been rebased

@mtreinish mtreinish requested a review from a team as a code owner August 24, 2023 03:51
@qiskit-bot
Copy link
Collaborator

One or more of the the following people are requested to review this:

  • @Qiskit/terra-core

@coveralls
Copy link

coveralls commented Aug 24, 2023

Pull Request Test Coverage Report for Build 5976660918

  • 12 of 14 (85.71%) changed or added relevant lines in 1 file are covered.
  • 9 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.003%) to 87.247%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/transpiler/passes/basis/unroll_3q_or_more.py 12 14 85.71%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/expr.rs 1 93.76%
crates/qasm2/src/lex.rs 2 90.63%
crates/qasm2/src/parse.rs 6 96.2%
Totals Coverage Status
Change from base Build 5976504024: -0.003%
Covered Lines: 74269
Relevant Lines: 85125

💛 - Coveralls

@mtreinish mtreinish added performance Changelog: None Do not include in changelog labels Aug 24, 2023
This commit adds a cache for parameter free gate decompositions in
the Unroll3qOrMore transpiler pass. If there are repeated gates in a
circuit that don't have any parameters, then when we decompose the
circuit a single time that decomposition will be valid for all instances
of that gate. This doesn't hold true for parameterized gates because the
parameter value will determine the decomposition used. This greatly
improves the runtime performance of the pass when there are repeated >=3
qubit gates in the circuit.

In a future revision we can investigate adding the parameter values to
the cache lookup. But the hit rate would likely be fairly low in such
cases because it would only provide runtime improvements if there are
repeated >3 qubit gates with the same exact parameter values, which
seems fairly rare in practice.

This is based on top of Qiskit#10702 and will need to be rebased after Qiskit#10702
merges.
@mtreinish mtreinish added this to the 0.45.0 milestone Aug 26, 2023
params = node.op.params
name = node.op.name
if not params:
cached_decomp = self._cache.get(name, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about gates that have non-unique names, such as Isometry or PauliEvolutionGate? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh, I always forget about those gates. But I think this should still be sound because they all have parameters set so this won't cache them. But if you can think of an example this isn't sound then I have another idea to get this result (move to the basis translator use that instead).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biggest risk is with things like QFT that aren't parametrised, but are variadic over the qubit count.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I guess there are two paths here then, either we can add a qargs len() or a tuple of qargs to the cache key and assume with a given name and qubits it's the same gate. Or alternatively we can abandon this caching approach and move to the doing unroll3q to us the basis translator instead (by adding a equiv library option to this pass and a min qubits arg to the basis translator). I'm thinking the basis translator approach will be the robust one, because the use case I'm concerned with here is if someone submits a circuit with 200k toffolis we spend a huge amount of time unrolling the same toffoli over and over again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I feel like this should be a bit more of a unified part of the basis translator. I actually think we might want to consider what a transpiler pipeline that has basis translation before routing would look like, so that we don't need the 3+q split at all; I could imagine a Target-aware router that synthesises the swaps into something allowable immediately on output, such that basis-translation afterwards isn't necessary.

mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Sep 5, 2023
This commit modifies the the preset pass manager construction to by
default use the UnrollCustomDefinitions and BasisTranslator passes to
perform the unrolling of operations that operate on >= 3 qubits. This is
done by leveraging the min_qubits argument added to those passes in this
commit which lets you set a minimum number of qubit arguments to
translate operations for.

Previously, this was handled by the Unroll3qOrMore pass which works by
iterating over the DAG and finding all operations with >= 3 qubits and
recursively unrolling them until the output is all in terms of 1 or 2
qubit operations. This works fine, but has undesireable runtime
performance characteristics because standard gates repeatedly have to
build new DAGs to substitute in. For example, if you had a circuit with
50k CCXGates that would result in 50k new DAGs being created from
scratch for each instance of CCXGate. A previous attempt was made to
add a translation cache to that pass in Qiskit#10703 but as was discussed in
code review caching the circuit is unsound as it's possible for a custom
gate object to be constructed such that it would be cached and return an
invalid substitution circuit. By leveraging the basis translator
we only build as many substition DAGs as needed.
@mtreinish
Copy link
Member Author

Closing this in favor of #10776

@mtreinish mtreinish closed this Sep 5, 2023
github-merge-queue bot pushed a commit that referenced this pull request Sep 6, 2023
This commit modifies the the preset pass manager construction to by
default use the UnrollCustomDefinitions and BasisTranslator passes to
perform the unrolling of operations that operate on >= 3 qubits. This is
done by leveraging the min_qubits argument added to those passes in this
commit which lets you set a minimum number of qubit arguments to
translate operations for.

Previously, this was handled by the Unroll3qOrMore pass which works by
iterating over the DAG and finding all operations with >= 3 qubits and
recursively unrolling them until the output is all in terms of 1 or 2
qubit operations. This works fine, but has undesireable runtime
performance characteristics because standard gates repeatedly have to
build new DAGs to substitute in. For example, if you had a circuit with
50k CCXGates that would result in 50k new DAGs being created from
scratch for each instance of CCXGate. A previous attempt was made to
add a translation cache to that pass in #10703 but as was discussed in
code review caching the circuit is unsound as it's possible for a custom
gate object to be constructed such that it would be cached and return an
invalid substitution circuit. By leveraging the basis translator
we only build as many substition DAGs as needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants