Skip to content

Conversation

mtreinish
Copy link
Member

Summary

This commit further tunes the performance of the optimize 1q
decomposition pass. Previously, the pass was constructing an operator
using the Operator qi class and then calling compose() on each gate in
the run to find the overall unitary of the run. However, doing this was
constructing a lot of intermediate operators that weren't needed and
just added extra overhead. To avoid that overhead this commit relies on
the raw numpy calculations to build the matrix for the overall operator
and then wraps that in an Operator() object only once when we go to call
the OneQubitEulerDecomposer.

Details and comments

This commit further tunes the performance of the optimize 1q
decomposition pass. Previously, the pass was constructing an operator
using the Operator qi class and then calling compose() on each gate in
the run to find the overall unitary of the run. However, doing this was
constructing a lot of intermediate operators that weren't needed and
just added extra overhead. To avoid that overhead this commit relies on
the raw numpy calculations to build the matrix for the overall operator
and then wraps that in an Operator() object only once when we go to call
the OneQubitEulerDecomposer.
@mtreinish mtreinish added this to the 0.17 milestone Feb 25, 2021
@mtreinish mtreinish requested a review from a team as a code owner February 25, 2021 21:40
Copy link
Member

@kdk kdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, and will likely go a long way to making Optimize1qGateDecompositions more comparable in performance to Optimize1qGates. Do you have any preliminary performance numbers?

@mtreinish
Copy link
Member Author

mtreinish commented Feb 25, 2021

I did some quick manual benchmarking (only one sample each run so not sure what the error is) with a random_circuit(65, 4096, measure=True, seed=42) run through the basis translator first the pass takes:

branch basis time (sec)
master aer's 35.0315
master new ibmq 29.5729
This PR aer's 33.1140
This PR new ibmq 25.8476
w/ to_matrix() and np.dot() aer's 33.4808
w/ to_matrix() and np.dot() new ibmq 24.9945
w/ to_matrix() and operator.dot() aer's 33.4753
w/ to_matrix() and operator.dot() new ibmq 24.8618

From when I profiled the pass recently: #5775 (comment) the biggest bottleneck is the decomposer, so it looks like the 2x difference, which is only ~40us, ends up lost in the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants