-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add DAGCircuit output option to OneQubitEulerDecomposer #5926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DAGCircuit output option to OneQubitEulerDecomposer #5926
Conversation
This commit improves the performance slightly of the OneQubitEulerDecomposer by avoiding the use of append(), which does extra validation of the arguments when constructing the output circuits. Instead the _append() method is used which does not have the validation overhead, however this is not necessary since we know how the circuit is constructed and the arguments when adding the gate are being set correctly (there is only 1 qubit anyway).
The np.isclose() function quickly becomes a bottlneck in the OneQubitEulerDecomposer class as the majority of the operations are comparing angles to decide when to insert gates. However, we can avoid this overhead by switching from numpy's isclose function which is optimzed for arrays not floats [1][2] to just using the standard library's math module's isclose() function. This commit makes this change as well as using intermediate variables to avoid duplicate calculations to speed up the decomposer.
According to the np.mod() docs [1] the np.mod() is equivalent to the stdlib '%' operator in python however it is designed to work with numpy array's instead of single values. This adds about an order of magnitude overhead to the mod operations (on the order of 1us vs 100ns) which can add up as _mod2pi is called multiple times. However python's mod operator is not ideal for working with floats [2] and when used produces different results than expected. To avoid this but still improve performance this commit switches to use the stdlib math module's fmod() function, which produces the expected result and is only marginaly slower than the '%' operator, which will still avoid the overhead of the numpy function. [1] https://numpy.org/doc/stable/reference/generated/numpy.mod.html [2] https://docs.python.org/3/library/math.html#math.fmod
This commit adds a new option to the euler one qubit decomposer class, use_dag, which when set to True will return a DAGCircuit object for the decomposed matrix instead of a QuantumCircuit object. This is useful for transpiler passes that call the decomposer so that they don't have to convert from a circuit to a dag. The passes that use the decomposer are also updated to use this new option.
1b5f920
to
99f0b5c
Compare
Can you post some before/after perf numbers for this? It definitely seems like it should be a nice win. |
Yeah, let me run some quick numbers and post the difference between this, the PR before this in the series, and master |
Running this script (as a worst case): from qiskit.transpiler.passes import BasisTranslator
from qiskit.transpiler.passes import Optimize1qGatesDecomposition
from qiskit.circuit.random import random_circuit
from qiskit.converters import circuit_to_dag
from qiskit import Aer
from qiskit.test.mock.backends import FakeManhattan
from qiskit.circuit.equivalence_library import SessionEquivalenceLibrary as sel
import time
backend_sim = Aer.get_backend('qasm_simulator')
aer_basis = backend_sim.configuration().basis_gates
input_circuit = random_circuit(65, 4096, measure=True, seed=424242424242)
# Aer path
print("Aer:")
dag = circuit_to_dag(input_circuit)
bt_aer = BasisTranslator(sel, aer_basis)
basis_dag = bt_aer.run(dag)
start = time.time()
optimize_pass = Optimize1qGatesDecomposition(aer_basis)
stop = time.time()
print("Create Optimize1qGatesDecomposition %s" % str(stop - start))
start = time.time()
result = optimize_pass.run(basis_dag)
stop = time.time()
print("Optimize 1Q decomposition %s" % str(stop - start))
print("depth: %s" % result.depth()) The results are:
I only ran one sample so I don't know what the spread is on the timing. I'm assuming that this PR and the previous one are basically equivalent (after switching to |
Given too-small-to-measure perf benefit, I suggest not making this change given it adds another path through the code that would need be tested and maintained. |
Yeah, I agree. I should have done more targeted benchmarks for each stage earlier instead of just pushing PRs for every idea I had and measuring the performance at the top of the stack. I'll drop this PR (and the equivalent for the 2q decomposer) and rebase the disable unitary validation one to not pull this in. It's just always surprising to me that |
This commit adds a new option to the euler one qubit decomposer class, use_dag, which when set to True will return a DAGCircuit object for the decomposed matrix instead of a QuantumCircuit object. This is useful for transpiler passes that call the decomposer so that they don't have to convert from a circuit to a dag. The passes that use the decomposer are also updated to use this new option. This was originally attempted before in Qiskit#5926 but was abandoned because at the time there was no real performance improvement from doing this. However, since then this class has changed quite a bit, and after Qiskit#9185 the overhead of conversion between QuantumCircuit and DAGCircuit is a more critical component of the runtime performance of the 1 qubit optimization pass. By operating natively in DAGCircuit from the decomposer we're able to remove this overhead and directly substitute the output of the decomposer in the pass.
This commit adds a new option to the euler one qubit decomposer class, use_dag, which when set to True will return a DAGCircuit object for the decomposed matrix instead of a QuantumCircuit object. This is useful for transpiler passes that call the decomposer so that they don't have to convert from a circuit to a dag. The passes that use the decomposer are also updated to use this new option. This was originally attempted before in Qiskit#5926 but was abandoned because at the time there was no real performance improvement from doing this. However, since then this class has changed quite a bit, and after Qiskit#9185 the overhead of conversion between QuantumCircuit and DAGCircuit is a more critical component of the runtime performance of the 1 qubit optimization pass. By operating natively in DAGCircuit from the decomposer we're able to remove this overhead and directly substitute the output of the decomposer in the pass.
* Add DAGCircuit output option to OneQubitEulerDecomposer This commit adds a new option to the euler one qubit decomposer class, use_dag, which when set to True will return a DAGCircuit object for the decomposed matrix instead of a QuantumCircuit object. This is useful for transpiler passes that call the decomposer so that they don't have to convert from a circuit to a dag. The passes that use the decomposer are also updated to use this new option. This was originally attempted before in #5926 but was abandoned because at the time there was no real performance improvement from doing this. However, since then this class has changed quite a bit, and after #9185 the overhead of conversion between QuantumCircuit and DAGCircuit is a more critical component of the runtime performance of the 1 qubit optimization pass. By operating natively in DAGCircuit from the decomposer we're able to remove this overhead and directly substitute the output of the decomposer in the pass. * Use a dataclass for internal gate list and phase tracking This commit updates the internal psx circuit generator function to pass a dataclass with a field for the gates and phase instead of using 2 lists. This provides a cleaner interface for using a mutable reference between the different functions used to add gates to the circuit. * Directly combine op node lists in optimize_1q_commutation
* Add DAGCircuit output option to OneQubitEulerDecomposer This commit adds a new option to the euler one qubit decomposer class, use_dag, which when set to True will return a DAGCircuit object for the decomposed matrix instead of a QuantumCircuit object. This is useful for transpiler passes that call the decomposer so that they don't have to convert from a circuit to a dag. The passes that use the decomposer are also updated to use this new option. This was originally attempted before in Qiskit#5926 but was abandoned because at the time there was no real performance improvement from doing this. However, since then this class has changed quite a bit, and after Qiskit#9185 the overhead of conversion between QuantumCircuit and DAGCircuit is a more critical component of the runtime performance of the 1 qubit optimization pass. By operating natively in DAGCircuit from the decomposer we're able to remove this overhead and directly substitute the output of the decomposer in the pass. * Use a dataclass for internal gate list and phase tracking This commit updates the internal psx circuit generator function to pass a dataclass with a field for the gates and phase instead of using 2 lists. This provides a cleaner interface for using a mutable reference between the different functions used to add gates to the circuit. * Directly combine op node lists in optimize_1q_commutation
Summary
This commit adds a new option to the euler one qubit decomposer class,
use_dag, which when set to True will return a DAGCircuit object for the
decomposed matrix instead of a QuantumCircuit object. This is useful for
transpiler passes that call the decomposer so that they don't have to
convert from a circuit to a dag. The passes that use the decomposer are
also updated to use this new option.
Details and comments
This is based on top of #5915 and is blocked until #5915 merges.