Skip to content

[vulkan] Inconsistent segfault at shutdown on NVIDIA hardware #8497

@derek-gerstmann

Description

@derek-gerstmann

This is happening intermittently on the Linux worker build-bots, but doesn't present itself on nearly identical drivers and hardware when testing locally.

It shows up as a segfault at process exit for the correctness tests after the tests have run. When it happens, the Vulkan ICD function pointer chain is invalid, and any call to a Vulkan API method will segfault. If we don't cleanup, then the driver itself crashes. Same symptoms appear under JIT and AOT.

System details:

Ubuntu 22.04
Vulkan Loader v1.3.296
Vulkan API v1.3.280
NVIDIA Driver v560.35.5.0
NVIDIA GeForce RTX 3070

It appears to be either a Vulkan and/or NVIDIA driver bug. Running under the validation layers, and crash detection layers doesn't reveal anything, and we never receive a device lost error, making it difficult to detect or handle.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions