-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Cache parameterless multi-qubit decompositions in Unroll3qOrMore #10703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
One or more of the the following people are requested to review this:
|
Pull Request Test Coverage Report for Build 5976660918
💛 - Coveralls |
485854d
to
60f6167
Compare
This commit adds a cache for parameter free gate decompositions in the Unroll3qOrMore transpiler pass. If there are repeated gates in a circuit that don't have any parameters, then when we decompose the circuit a single time that decomposition will be valid for all instances of that gate. This doesn't hold true for parameterized gates because the parameter value will determine the decomposition used. This greatly improves the runtime performance of the pass when there are repeated >=3 qubit gates in the circuit. In a future revision we can investigate adding the parameter values to the cache lookup. But the hit rate would likely be fairly low in such cases because it would only provide runtime improvements if there are repeated >3 qubit gates with the same exact parameter values, which seems fairly rare in practice. This is based on top of Qiskit#10702 and will need to be rebased after Qiskit#10702 merges.
60f6167
to
4d2b79c
Compare
params = node.op.params | ||
name = node.op.name | ||
if not params: | ||
cached_decomp = self._cache.get(name, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about gates that have non-unique names, such as Isometry
or PauliEvolutionGate
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh, I always forget about those gates. But I think this should still be sound because they all have parameters set so this won't cache them. But if you can think of an example this isn't sound then I have another idea to get this result (move to the basis translator use that instead).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the biggest risk is with things like QFT
that aren't parametrised, but are variadic over the qubit count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I guess there are two paths here then, either we can add a qargs len()
or a tuple of qargs to the cache key and assume with a given name and qubits it's the same gate. Or alternatively we can abandon this caching approach and move to the doing unroll3q to us the basis translator instead (by adding a equiv library option to this pass and a min qubits arg to the basis translator). I'm thinking the basis translator approach will be the robust one, because the use case I'm concerned with here is if someone submits a circuit with 200k toffolis we spend a huge amount of time unrolling the same toffoli over and over again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I feel like this should be a bit more of a unified part of the basis translator. I actually think we might want to consider what a transpiler pipeline that has basis translation before routing would look like, so that we don't need the 3+q split at all; I could imagine a Target
-aware router that synthesises the swaps into something allowable immediately on output, such that basis-translation afterwards isn't necessary.
This commit modifies the the preset pass manager construction to by default use the UnrollCustomDefinitions and BasisTranslator passes to perform the unrolling of operations that operate on >= 3 qubits. This is done by leveraging the min_qubits argument added to those passes in this commit which lets you set a minimum number of qubit arguments to translate operations for. Previously, this was handled by the Unroll3qOrMore pass which works by iterating over the DAG and finding all operations with >= 3 qubits and recursively unrolling them until the output is all in terms of 1 or 2 qubit operations. This works fine, but has undesireable runtime performance characteristics because standard gates repeatedly have to build new DAGs to substitute in. For example, if you had a circuit with 50k CCXGates that would result in 50k new DAGs being created from scratch for each instance of CCXGate. A previous attempt was made to add a translation cache to that pass in Qiskit#10703 but as was discussed in code review caching the circuit is unsound as it's possible for a custom gate object to be constructed such that it would be cached and return an invalid substitution circuit. By leveraging the basis translator we only build as many substition DAGs as needed.
Closing this in favor of #10776 |
This commit modifies the the preset pass manager construction to by default use the UnrollCustomDefinitions and BasisTranslator passes to perform the unrolling of operations that operate on >= 3 qubits. This is done by leveraging the min_qubits argument added to those passes in this commit which lets you set a minimum number of qubit arguments to translate operations for. Previously, this was handled by the Unroll3qOrMore pass which works by iterating over the DAG and finding all operations with >= 3 qubits and recursively unrolling them until the output is all in terms of 1 or 2 qubit operations. This works fine, but has undesireable runtime performance characteristics because standard gates repeatedly have to build new DAGs to substitute in. For example, if you had a circuit with 50k CCXGates that would result in 50k new DAGs being created from scratch for each instance of CCXGate. A previous attempt was made to add a translation cache to that pass in #10703 but as was discussed in code review caching the circuit is unsound as it's possible for a custom gate object to be constructed such that it would be cached and return an invalid substitution circuit. By leveraging the basis translator we only build as many substition DAGs as needed.
Summary
This commit adds a cache for parameter free gate decompositions in
the
Unroll3qOrMore
transpiler pass. If there are repeated gates in acircuit that don't have any parameters, then when we decompose the
circuit a single time that decomposition will be valid for all instances
of that gate. This doesn't hold true for parameterized gates because the
parameter value will determine the decomposition used. This greatly
improves the runtime performance of the pass when there are repeated >=3
qubit gates in the circuit.
In a future revision we can investigate adding the parameter values to
the cache lookup. But the hit rate would likely be fairly low in such
cases because it would only provide runtime improvements if there are
repeated >3 qubit gates with the same exact parameter values, which
seems fairly rare in practice.
Details and comments
This is based on top of #10702 and will need to be rebased after #10702 mergesThis has been rebased