Skip to content

Conversation

mtreinish
Copy link
Member

Summary

This commit adds a new option to the euler one qubit decomposer class,
use_dag, which when set to True will return a DAGCircuit object for the
decomposed matrix instead of a QuantumCircuit object. This is useful for
transpiler passes that call the decomposer so that they don't have to
convert from a circuit to a dag. The passes that use the decomposer are
also updated to use this new option.

Details and comments

This is based on top of #5915 and is blocked until #5915 merges.

This commit improves the performance slightly of the
OneQubitEulerDecomposer by avoiding the use of append(), which does extra
validation of the arguments when constructing the output circuits.
Instead the _append() method is used which does not have the validation
overhead, however this is not necessary since we know how the circuit is
constructed and the arguments when adding the gate are being set
correctly (there is only 1 qubit anyway).
The np.isclose() function quickly becomes a bottlneck in the
OneQubitEulerDecomposer class as the majority of the operations are
comparing angles to decide when to insert gates. However, we can avoid
this overhead by switching from numpy's isclose function which is
optimzed for arrays not floats [1][2] to just using the standard
library's math module's isclose() function. This commit makes this
change as well as using intermediate variables to avoid duplicate
calculations to speed up the decomposer.
@mtreinish mtreinish added on hold Can not fix yet performance labels Feb 27, 2021
@mtreinish mtreinish requested review from chriseclectic and a team as code owners February 27, 2021 13:22
According to the np.mod() docs [1] the np.mod() is equivalent to the
stdlib '%' operator in python however it is designed to work with numpy
array's instead of single values. This adds about an order of magnitude
overhead to the mod operations (on the order of 1us vs 100ns) which can
add up as _mod2pi is called multiple times. However python's mod
operator is not ideal for working with floats [2] and when used produces
different results than expected. To avoid this but still improve
performance this commit switches to use the stdlib math module's
fmod() function, which produces the expected result and is only
marginaly slower than the '%' operator, which will still avoid the
overhead of the numpy function.

[1] https://numpy.org/doc/stable/reference/generated/numpy.mod.html
[2] https://docs.python.org/3/library/math.html#math.fmod
This commit adds a new option to the euler one qubit decomposer class,
use_dag, which when set to True will return a DAGCircuit object for the
decomposed matrix instead of a QuantumCircuit object. This is useful for
transpiler passes that call the decomposer so that they don't have to
convert from a circuit to a dag. The passes that use the decomposer are
also updated to use this new option.
@levbishop
Copy link
Member

Can you post some before/after perf numbers for this? It definitely seems like it should be a nice win.

@mtreinish
Copy link
Member Author

Can you post some before/after perf numbers for this? It definitely seems like it should be a nice win.

Yeah, let me run some quick numbers and post the difference between this, the PR before this in the series, and master

@mtreinish
Copy link
Member Author

Running this script (as a worst case):

from qiskit.transpiler.passes import BasisTranslator
from qiskit.transpiler.passes import Optimize1qGatesDecomposition
from qiskit.circuit.random import random_circuit
from qiskit.converters import circuit_to_dag
from qiskit import Aer
from qiskit.test.mock.backends import FakeManhattan
from qiskit.circuit.equivalence_library import SessionEquivalenceLibrary as sel
import time

backend_sim = Aer.get_backend('qasm_simulator')
aer_basis = backend_sim.configuration().basis_gates
input_circuit = random_circuit(65, 4096, measure=True, seed=424242424242)

# Aer path
print("Aer:")
dag = circuit_to_dag(input_circuit)
bt_aer = BasisTranslator(sel, aer_basis)
basis_dag = bt_aer.run(dag)
start = time.time()
optimize_pass = Optimize1qGatesDecomposition(aer_basis)
stop = time.time()
print("Create Optimize1qGatesDecomposition %s" % str(stop - start))
start = time.time()
result = optimize_pass.run(basis_dag)
stop = time.time()
print("Optimize 1Q decomposition %s" % str(stop - start))
print("depth: %s" % result.depth())

The results are:

Version run() time (sec) depth
master 32.93131995201111 7753
#5928 19.56033992767334 7753
#5926 (this PR) 20.859666109085083 7753

I only ran one sample so I don't know what the spread is on the timing. I'm assuming that this PR and the previous one are basically equivalent (after switching to _append() in the previous pr) and the difference is just in the noise.

@levbishop
Copy link
Member

levbishop commented Mar 4, 2021

Given too-small-to-measure perf benefit, I suggest not making this change given it adds another path through the code that would need be tested and maintained.

@mtreinish
Copy link
Member Author

mtreinish commented Mar 5, 2021

Given too-small-to-measure perf benefit, I suggest not making this change given it adds another path through the code that would need be tested and maintained.

Yeah, I agree. I should have done more targeted benchmarks for each stage earlier instead of just pushing PRs for every idea I had and measuring the performance at the top of the stack. I'll drop this PR (and the equivalent for the 2q decomposer) and rebase the disable unitary validation one to not pull this in.

It's just always surprising to me that circuit_to_dag is not a bottleneck because we're basically add circuit construction time on top of the dag construction that this does but with the overhead of an instruction.copy() too in the converter.

@mtreinish mtreinish closed this Mar 5, 2021
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Nov 23, 2022
This commit adds a new option to the euler one qubit decomposer class,
use_dag, which when set to True will return a DAGCircuit object for the
decomposed matrix instead of a QuantumCircuit object. This is useful for
transpiler passes that call the decomposer so that they don't have to
convert from a circuit to a dag. The passes that use the decomposer are
also updated to use this new option.

This was originally attempted before in Qiskit#5926 but was abandoned because
at the time there was no real performance improvement from doing this.
However, since then this class has changed quite a bit, and after Qiskit#9185
the overhead of conversion between QuantumCircuit and DAGCircuit is a
more critical component of the runtime performance of the 1 qubit
optimization pass. By operating natively in DAGCircuit from the
decomposer we're able to remove this overhead and directly substitute
the output of the decomposer in the pass.
mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Dec 2, 2022
This commit adds a new option to the euler one qubit decomposer class,
use_dag, which when set to True will return a DAGCircuit object for the
decomposed matrix instead of a QuantumCircuit object. This is useful for
transpiler passes that call the decomposer so that they don't have to
convert from a circuit to a dag. The passes that use the decomposer are
also updated to use this new option.

This was originally attempted before in Qiskit#5926 but was abandoned because
at the time there was no real performance improvement from doing this.
However, since then this class has changed quite a bit, and after Qiskit#9185
the overhead of conversion between QuantumCircuit and DAGCircuit is a
more critical component of the runtime performance of the 1 qubit
optimization pass. By operating natively in DAGCircuit from the
decomposer we're able to remove this overhead and directly substitute
the output of the decomposer in the pass.
mergify bot pushed a commit that referenced this pull request Dec 2, 2022
* Add DAGCircuit output option to OneQubitEulerDecomposer

This commit adds a new option to the euler one qubit decomposer class,
use_dag, which when set to True will return a DAGCircuit object for the
decomposed matrix instead of a QuantumCircuit object. This is useful for
transpiler passes that call the decomposer so that they don't have to
convert from a circuit to a dag. The passes that use the decomposer are
also updated to use this new option.

This was originally attempted before in #5926 but was abandoned because
at the time there was no real performance improvement from doing this.
However, since then this class has changed quite a bit, and after #9185
the overhead of conversion between QuantumCircuit and DAGCircuit is a
more critical component of the runtime performance of the 1 qubit
optimization pass. By operating natively in DAGCircuit from the
decomposer we're able to remove this overhead and directly substitute
the output of the decomposer in the pass.

* Use a dataclass for internal gate list and phase tracking

This commit updates the internal psx circuit generator function to pass
a dataclass with a field for the gates and phase instead of using 2
lists. This provides a cleaner interface for using a mutable reference
between the different functions used to add gates to the circuit.

* Directly combine op node lists in optimize_1q_commutation
Cryoris pushed a commit to Cryoris/qiskit-terra that referenced this pull request Jan 12, 2023
* Add DAGCircuit output option to OneQubitEulerDecomposer

This commit adds a new option to the euler one qubit decomposer class,
use_dag, which when set to True will return a DAGCircuit object for the
decomposed matrix instead of a QuantumCircuit object. This is useful for
transpiler passes that call the decomposer so that they don't have to
convert from a circuit to a dag. The passes that use the decomposer are
also updated to use this new option.

This was originally attempted before in Qiskit#5926 but was abandoned because
at the time there was no real performance improvement from doing this.
However, since then this class has changed quite a bit, and after Qiskit#9185
the overhead of conversion between QuantumCircuit and DAGCircuit is a
more critical component of the runtime performance of the 1 qubit
optimization pass. By operating natively in DAGCircuit from the
decomposer we're able to remove this overhead and directly substitute
the output of the decomposer in the pass.

* Use a dataclass for internal gate list and phase tracking

This commit updates the internal psx circuit generator function to pass
a dataclass with a field for the gates and phase instead of using 2
lists. This provides a cleaner interface for using a mutable reference
between the different functions used to add gates to the circuit.

* Directly combine op node lists in optimize_1q_commutation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
on hold Can not fix yet performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants