-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Use BasisTranslator for unroll 3q or more #10776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit modifies the the preset pass manager construction to by default use the UnrollCustomDefinitions and BasisTranslator passes to perform the unrolling of operations that operate on >= 3 qubits. This is done by leveraging the min_qubits argument added to those passes in this commit which lets you set a minimum number of qubit arguments to translate operations for. Previously, this was handled by the Unroll3qOrMore pass which works by iterating over the DAG and finding all operations with >= 3 qubits and recursively unrolling them until the output is all in terms of 1 or 2 qubit operations. This works fine, but has undesireable runtime performance characteristics because standard gates repeatedly have to build new DAGs to substitute in. For example, if you had a circuit with 50k CCXGates that would result in 50k new DAGs being created from scratch for each instance of CCXGate. A previous attempt was made to add a translation cache to that pass in Qiskit#10703 but as was discussed in code review caching the circuit is unsound as it's possible for a custom gate object to be constructed such that it would be cached and return an invalid substitution circuit. By leveraging the basis translator we only build as many substition DAGs as needed.
One or more of the the following people are requested to review this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks correct to me, though it makes me wonder if we'll see any performance regressions in present transpile
that have very few custom operations or 3+q gates, since we're now going to run two passes rather than one. It probably shouldn't be very much if so.
This is totally beside the point, but the behaviour described by the release notes is actually still true even if min_qubits
is set to zero haha.
I left it untagged for merge in case we wanted to look at timings before merge - if not, I'm happy for somebody just to add it to the queue. |
Sure, I'll run some asv benchmarks to compare a bunch of different circuits and also do some manual testing with circuits that have large numbers of 3q gates just to get some data for the PR history. |
I ran a subset of transpilation asv benchmarks and it was basically some showed a bit of a speed improvement, but one QV test showed a runtime regression (I'm not sure exactly what was going on there but it's also only a 14ms regression so it's not a huge deal from my perspective). The
|
As for a place where this makes a much bigger impact is running:
with On main this takes ~43.695 sec and with this PR it took ~31.078 sec. |
Summary
This commit modifies the the preset pass manager construction to by default use the UnrollCustomDefinitions and BasisTranslator passes to perform the unrolling of operations that operate on >= 3 qubits. This is done by leveraging the min_qubits argument added to those passes in this commit which lets you set a minimum number of qubit arguments to translate operations for.
Previously, this was handled by the Unroll3qOrMore pass which works by iterating over the DAG and finding all operations with >= 3 qubits and recursively unrolling them until the output is all in terms of 1 or 2 qubit operations. This works fine, but has undesireable runtime performance characteristics because standard gates repeatedly have to build new DAGs to substitute in. For example, if you had a circuit with 50k CCXGates that would result in 50k new DAGs being created from scratch for each instance of CCXGate. A previous attempt was made to add a translation cache to that pass in #10703 but as was discussed in code review caching the circuit is unsound as it's possible for a custom gate object to be constructed such that it would be cached and return an invalid substitution circuit. By leveraging the basis translator we only build as many substition DAGs as needed.
Details and comments