Skip to content

Memory usage grows quickly with circuit depth #6991

@rsln-s

Description

@rsln-s

Information

  • Qiskit Terra version:
    qiskit 0.29.0
    qiskit-aer 0.8.2
    qiskit-aqua 0.9.4
    qiskit-ibmq-provider 0.16.0
    qiskit-ignis 0.6.0
    qiskit-machine-learning 0.2.1
    qiskit-terra 0.18.1
  • Python version:
    Python 3.7.11 :: Intel Corporation
  • Operating system:
    CentOS Linux 7

What is the current behavior?

The memory requirements grown unexpectedly (unreasonably?) quickly with the circuit depth. As the circuit depth is varied, the qubit count stays fixed. I assume this issue is due to some inefficiency in transpiler, since according to the trace the function that runs out of memory is QuantumInstance.traspile. This is undesirable as many use cases (e.g. QuantumKernel class) require evaluating very large numbers of circuits.

Steps to reproduce the problem

import numpy as np
import time
import sys

from qiskit import BasicAer
from qiskit.providers.aer import AerSimulator
from qiskit.utils import QuantumInstance
from qiskit.circuit.library import ZZFeatureMap
from qiskit_machine_learning.kernels import QuantumKernel

reps = int(sys.argv[1]) 

n = 20
m = 800
X = np.random.uniform(0,1,n*m).reshape(m,n)
FeatureMap = ZZFeatureMap(n, reps=reps)

t1 = time.time()

quantum_instance = QuantumInstance(AerSimulator(method="statevector"))
quantum_kernel = QuantumKernel(feature_map=FeatureMap, quantum_instance=quantum_instance)

kernel_matrix = quantum_kernel.evaluate(x_vec = X)

t2 = time.time()

print(f'Finished in {t2-t1} sec')

To run:

>>> for i in $(seq 2 4 12); do mprof run --include-children python test_QuantumInstance.py $i; done

Output:

mprof: Sampling memory every 0.1s
running new process
Finished in 389.05642342567444 sec
mprof: Sampling memory every 0.1s
running new process
Finished in 866.7115132808685 sec
mprof: Sampling memory every 0.1s
running new process
Traceback (most recent call last):
  File "test_QuantumInstance.py", line 27, in <module>
    kernel_matrix = quantum_kernel.evaluate(x_vec = X)
  File "/home/rshaydulin/soft/anaconda3/envs/qiskit_latest/lib/python3.7/site-packages/qiskit_machine_learning/kernels/quantum_kernel.py", line 290, in evaluate
    results = self._quantum_instance.execute(circuits)
  File "/home/rshaydulin/soft/anaconda3/envs/qiskit_latest/lib/python3.7/site-packages/qiskit/utils/quantum_instance.py", line 398, in execute
    circuits = self.transpile(circuits)
  File "/home/rshaydulin/soft/anaconda3/envs/qiskit_latest/lib/python3.7/site-packages/qiskit/utils/quantum_instance.py", line 346, in transpile
    circuits, self._backend, **self._backend_config, **self._compile_config
  File "/home/rshaydulin/soft/anaconda3/envs/qiskit_latest/lib/python3.7/site-packages/qiskit/compiler/transpiler.py", line 293, in transpile
    circuits = parallel_map(_transpile_circuit, list(zip(circuits, transpile_args)))
  File "/home/rshaydulin/soft/anaconda3/envs/qiskit_latest/lib/python3.7/site-packages/qiskit/tools/parallel.py", line 164, in parallel_map
    raise error
  File "/home/rshaydulin/soft/anaconda3/envs/qiskit_latest/lib/python3.7/site-packages/qiskit/tools/parallel.py", line 154, in parallel_map
    results = list(future)
  File "/home/rshaydulin/soft/anaconda3/envs/qiskit_latest/lib/python3.7/concurrent/futures/process.py", line 483, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/home/rshaydulin/soft/anaconda3/envs/qiskit_latest/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator
    yield fs.pop().result()
  File "/home/rshaydulin/soft/anaconda3/envs/qiskit_latest/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/home/rshaydulin/soft/anaconda3/envs/qiskit_latest/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

mprof trace:
image

Here the black line corresponds to reps=2, blue line to reps=6 and red line to reps=10 (crashed, presumably due to running out of memory; I confirmed that the memory was the issue by watching htop, though for some reason it the end of the run right before crash does not show up on the trace)

What is the expected behavior?

Memory requirements stay reasonable.

Suggested solutions

A potential solution could be batching of circuits. For example, if simulator receives 10^6 circuits, they are transpiled and evaluated in chunks of 10^3, with intermediate results saved and circuits discarded.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions