-
Notifications
You must be signed in to change notification settings - Fork 400
Description
Informations
- Qiskit Aer version: 0.11.2
- Python version: 3.9.15
- Operating system: Ubuntu 20.04
What is the current behavior?
Importing Qiskit Aer either implicitly or explicitly, as shown below, would get all GPUs on the system initialized, as evidenced by monitoring nvidia-smi
(there are other tools to check this, but nvidia-smi
is the simplest).
Steps to reproduce the problem
- Install
qiskit-aer-gpu
from PyPI (or build from source; how it's installed is irrelevant as long as the CUDA support is built) - Run any of the following command to get Aer imported
python -i -c "import qiskit_aer"
python -i -c "import qiskit.providers.aer"
python -i -c "from qiskit.providers.aer import AerSimulator"
python -i -c "import qiskit; print(qiskit.__qiskit_version__)"
- While the Python interpreter is idle waiting for input (due to the interactive prompt
-i
), checknvidia-smi
. On a multi-GPU system, it is clear that the CUDA context is initialized on all GPUs:
$ nvidia-smi
Sat Dec 17 19:05:35 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 530.06 Driver Version: 530.06 CUDA Version: 12.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A6000 On | 00000000:21:00.0 Off | Off |
| 30% 53C P2 88W / 300W | 267MiB / 49140MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX 6000... On | 00000000:22:00.0 Off | Off |
| 30% 58C P2 82W / 300W | 431MiB / 49140MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA RTX 6000... On | 00000000:41:00.0 Off | Off |
| 30% 47C P2 75W / 300W | 431MiB / 49140MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA RTX A6000 On | 00000000:43:00.0 Off | Off |
| 30% 57C P2 86W / 300W | 267MiB / 49140MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 316201 C python 264MiB |
| 1 N/A N/A 316201 C python 428MiB |
| 2 N/A N/A 316201 C python 428MiB |
| 3 N/A N/A 316201 C python 264MiB |
+-----------------------------------------------------------------------------+
What is the expected behavior?
Don't initialize the CUDA context at all at import time (with either explicit or implicit import). It hurts for many reasons:
- Performance: CUDA context initialization is known to be costly, and it is best to defer the init until it's actually needed. This is how most Python GPU packages (CuPy, PyTorch, TF, ...) work these days.
- I suspect this bug could contribute to some perf issues reported earlier, such as Qobj constructor is very slow when multiple experiments are given #1272. While I don't have direct proof, it's almost certain that at least for multiple processes they would compete for shared resources, see below.
- Unexpected: It's not expected that simple version queries like
qiskit.__qiskit_version__
would initialize GPUs.- This impacts all downstream packages directly or indirectly depending on Qiskit Aer (such as cuQuantum Python 😅)
- Resource contention: On a shared system like in my example, this bug could interfere with
- other users sharing the system, unless guarded by sophisticated (and correctly configured) resource management systems such as Slurm, or
- multiple processes launched via the main process
- CI/CD: Many public CI/CD pipelines (ex: conda-forge) do not have GPUs, but they could still do simple packaging tests for GPU packages. But the test might fail depending on how they test the package (that depends on Aer).
btw this bug is irrelevant of the number of GPUs -- even on a single-GPU system the issue would show up, but it does make the situation a lot worse on multi-GPU systems like NVIDIA DGX-A100.
Suggested solutions
The implementation (not semantics!) of the following two functions must be re-designed:
def available_methods(controller, methods): def available_devices(controller, devices):
as they both together contribute to this bug. Currently, how Qiskit Aer lists all available methods/devices is to run dummy executions and check for errors. This incurs not only runtime overheads but also GPU init issues when available.
I would suggest that these two attributes should be exposed all the way from C++ to Python through pybind11. This should be easily doable and enables much more lightweight checks, something we'd also like to ask for (but could be discussed in a separate ticket) 🙂
Thanks!