-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
What should we add?
Right now we have psutil
as a dependency which is used to get the number of physical CPUs and also the available amount of memory at the local system level. The physical CPU count is used to determine the max default number of parallel processes in parallel_map
and the memory information is used to determine statevector size limits for BasicAer
and also is printed in the jupyter version table magic. However, relying on an external lib for just these small uses seems a bit excessive primarily because psutil doesn't have precompiled wheels available for all our supported platforms which requires users on those platforms to have the appropriate compiler and library headers to build it from source to use qiskit. But additionally the checking we're doing via psutil in these cases isn't actually sound. These checks only look at what's reported from the OS for the entire system, but don't actually reflect any process limits that might be in place (this is documented in the psutil docs) so we shouldn't be making task scheduling or job limit decisions based on this information.
As discussed in #10868 we're using the physical core count in parallel_map()
to try and maximize performance because in the case of systems with SMT, scheduling jobs on each logical core can have lower performance. I think a reasonable approach is to default CPU_COUNT
to be int(len(os.sched_getaffinity(0)) / 2)
, as most (but obviously not all) modern systems have 2 way SMT. This obviously isn't as robust a detection solution to avoid using logical cores, but I think as a default it's reasonable especially as for any users that want different parallel dispatch behavior there is the QISKIT_NUM_PROCS
env variable or the num_processes
setting in the qiskit config file (https://qiskit.org/documentation/configuration.html).
The memory usage we can just remove the limits from basic aer and set the qubit count to 24 as that's ~270MB and realistically is always the limit for users. If there isn't sufficient memory available for a 24 qubit statevector for some reason, then numpy will error on allocation anyway. The only use case not covered is the jupyter magic, but I think we can just drop the memory reported from that as it's not super critical functionality.