Any explicit or implicit import of Qiskit Aer would initialize all GPUs on the system if the CUDA support is built

### Informations

- **Qiskit Aer version**: 0.11.2 
- **Python version**: 3.9.15
- **Operating system**: Ubuntu 20.04

### What is the current behavior?
Importing Qiskit Aer either implicitly or explicitly, as shown below, would get all GPUs on the system initialized, as evidenced by monitoring `nvidia-smi` (there are other tools to check this, but `nvidia-smi` is the simplest).


### Steps to reproduce the problem
1. Install `qiskit-aer-gpu` from PyPI (or build from source; how it's installed is irrelevant as long as the CUDA support is built) 
2. Run any of the following command to get Aer imported
   - `python -i -c "import qiskit_aer"`
   - `python -i -c "import qiskit.providers.aer"`
   - `python -i -c "from qiskit.providers.aer import AerSimulator"`
   - `python -i -c "import qiskit; print(qiskit.__qiskit_version__)"`
3. While the Python interpreter is idle waiting for input (due to the interactive prompt `-i`), check `nvidia-smi`. On a multi-GPU system, it is clear that the CUDA context is initialized on all GPUs:
```
$ nvidia-smi 
Sat Dec 17 19:05:35 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 530.06       Driver Version: 530.06       CUDA Version: 12.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A6000    On   | 00000000:21:00.0 Off |                  Off |
| 30%   53C    P2    88W / 300W |    267MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX 6000...  On   | 00000000:22:00.0 Off |                  Off |
| 30%   58C    P2    82W / 300W |    431MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX 6000...  On   | 00000000:41:00.0 Off |                  Off |
| 30%   47C    P2    75W / 300W |    431MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX A6000    On   | 00000000:43:00.0 Off |                  Off |
| 30%   57C    P2    86W / 300W |    267MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    316201      C   python                            264MiB |
|    1   N/A  N/A    316201      C   python                            428MiB |
|    2   N/A  N/A    316201      C   python                            428MiB |
|    3   N/A  N/A    316201      C   python                            264MiB |
+-----------------------------------------------------------------------------+
```

### What is the expected behavior?

Don't initialize the CUDA context at all at import time (with either explicit or implicit import). It hurts for many reasons:
1. Performance: CUDA context initialization is known to be costly, and it is best to defer the init until it's actually needed. This is how most Python GPU packages (CuPy, PyTorch, TF, ...) work these days.
    - I suspect this bug could contribute to some perf issues reported earlier, such as #1272. While I don't have direct proof, it's almost certain that at least for multiple processes they would compete for shared resources, see below.
2. Unexpected: It's not expected that simple version queries like `qiskit.__qiskit_version__` would initialize GPUs.
    - This impacts all downstream packages directly or indirectly depending on Qiskit Aer (such as cuQuantum Python 😅) 
3. Resource contention: On a shared system like in my example, this bug could interfere with
    - other users sharing the system, unless guarded by sophisticated (and correctly configured) resource management systems such as Slurm, or
    - multiple processes launched via the main process
4. CI/CD: Many public CI/CD pipelines (ex: conda-forge) do not have GPUs, but they could still do simple packaging tests for GPU packages. But the test might fail depending on how they test the package (that depends on Aer).
 
btw this bug is irrelevant of the number of GPUs -- even on a single-GPU system the issue would show up, but it does make the situation a lot worse on multi-GPU systems like NVIDIA DGX-A100.

### Suggested solutions

The implementation (not semantics!) of the following two functions must be re-designed:
- https://github.com/Qiskit/qiskit-aer/blob/13937fdb596142006bf00caf5676da13b43dfb5a/qiskit_aer/backends/backend_utils.py#L119
- https://github.com/Qiskit/qiskit-aer/blob/13937fdb596142006bf00caf5676da13b43dfb5a/qiskit_aer/backends/backend_utils.py#L137

as they both together contribute to this bug. Currently, how Qiskit Aer lists all available methods/devices is to run dummy executions and check for errors. This incurs not only runtime overheads but also GPU init issues when available.

I would suggest that these two attributes should be exposed all the way from C++ to Python through pybind11. This should be easily doable and enables much more lightweight checks, something we'd also like to ask for (but could be discussed in a separate ticket) 🙂 

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Any explicit or implicit import of Qiskit Aer would initialize all GPUs on the system if the CUDA support is built #1686

Informations

What is the current behavior?

Steps to reproduce the problem

What is the expected behavior?

Suggested solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Any explicit or implicit import of Qiskit Aer would initialize all GPUs on the system if the CUDA support is built #1686

Description

Informations

What is the current behavior?

Steps to reproduce the problem

What is the expected behavior?

Suggested solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions