Skip to content

Any explicit or implicit import of Qiskit Aer would initialize all GPUs on the system if the CUDA support is built #1686

@leofang

Description

@leofang

Informations

  • Qiskit Aer version: 0.11.2
  • Python version: 3.9.15
  • Operating system: Ubuntu 20.04

What is the current behavior?

Importing Qiskit Aer either implicitly or explicitly, as shown below, would get all GPUs on the system initialized, as evidenced by monitoring nvidia-smi (there are other tools to check this, but nvidia-smi is the simplest).

Steps to reproduce the problem

  1. Install qiskit-aer-gpu from PyPI (or build from source; how it's installed is irrelevant as long as the CUDA support is built)
  2. Run any of the following command to get Aer imported
    • python -i -c "import qiskit_aer"
    • python -i -c "import qiskit.providers.aer"
    • python -i -c "from qiskit.providers.aer import AerSimulator"
    • python -i -c "import qiskit; print(qiskit.__qiskit_version__)"
  3. While the Python interpreter is idle waiting for input (due to the interactive prompt -i), check nvidia-smi. On a multi-GPU system, it is clear that the CUDA context is initialized on all GPUs:
$ nvidia-smi 
Sat Dec 17 19:05:35 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 530.06       Driver Version: 530.06       CUDA Version: 12.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A6000    On   | 00000000:21:00.0 Off |                  Off |
| 30%   53C    P2    88W / 300W |    267MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX 6000...  On   | 00000000:22:00.0 Off |                  Off |
| 30%   58C    P2    82W / 300W |    431MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX 6000...  On   | 00000000:41:00.0 Off |                  Off |
| 30%   47C    P2    75W / 300W |    431MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX A6000    On   | 00000000:43:00.0 Off |                  Off |
| 30%   57C    P2    86W / 300W |    267MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    316201      C   python                            264MiB |
|    1   N/A  N/A    316201      C   python                            428MiB |
|    2   N/A  N/A    316201      C   python                            428MiB |
|    3   N/A  N/A    316201      C   python                            264MiB |
+-----------------------------------------------------------------------------+

What is the expected behavior?

Don't initialize the CUDA context at all at import time (with either explicit or implicit import). It hurts for many reasons:

  1. Performance: CUDA context initialization is known to be costly, and it is best to defer the init until it's actually needed. This is how most Python GPU packages (CuPy, PyTorch, TF, ...) work these days.
  2. Unexpected: It's not expected that simple version queries like qiskit.__qiskit_version__ would initialize GPUs.
    • This impacts all downstream packages directly or indirectly depending on Qiskit Aer (such as cuQuantum Python 😅)
  3. Resource contention: On a shared system like in my example, this bug could interfere with
    • other users sharing the system, unless guarded by sophisticated (and correctly configured) resource management systems such as Slurm, or
    • multiple processes launched via the main process
  4. CI/CD: Many public CI/CD pipelines (ex: conda-forge) do not have GPUs, but they could still do simple packaging tests for GPU packages. But the test might fail depending on how they test the package (that depends on Aer).

btw this bug is irrelevant of the number of GPUs -- even on a single-GPU system the issue would show up, but it does make the situation a lot worse on multi-GPU systems like NVIDIA DGX-A100.

Suggested solutions

The implementation (not semantics!) of the following two functions must be re-designed:

as they both together contribute to this bug. Currently, how Qiskit Aer lists all available methods/devices is to run dummy executions and check for errors. This incurs not only runtime overheads but also GPU init issues when available.

I would suggest that these two attributes should be exposed all the way from C++ to Python through pybind11. This should be easily doable and enables much more lightweight checks, something we'd also like to ask for (but could be discussed in a separate ticket) 🙂

Thanks!

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions