-
-
Notifications
You must be signed in to change notification settings - Fork 56
Description
I took a stab at implementing MSAA, as an alternative to the SSAA that we currently apply.
The differences
With SSAA (super sampling antialiasing) we just render everything at a higher resolution, and then average the result when "flushing" to the target. In Pygfx we now default to a pixel ratio of 2, expressed in logical pixels (so on a highres screen the buffer sizes actually match the screen). Because of how the flush step works, we can also use e.g. 1.5, 1.75 or 2.7.
With MSAA you use the GPU's hardware to do something very similar. With MSAA x4 you have as many samples as with SSAA and a pixel ratio of 2 (because the ratio applies to both dimensions). A major difference is that the fragment shader is applied once per pixel (so not for each sample in that pixel). In effect, there is only extra work at the edges of geometry. This is good for performance, and you can imagine how you a volume rendering (raycasting) is 4x more performant this way, with probably barely any visual downgrade. A downside is that any structures with the faces is not super-sampled. E.g. a part of a mesh that is made transparent.
Benchmarks
I ran a benchmark to test the memory consumptions of both approaches:
method | 640x480 | 1920x1080 | 1920x1080 McGuire |
---|---|---|---|
SSAA 1 MSAA 1 | 40 | 143 | 207 |
SSAA 4 MSAA 1 | 100 | 296 | 419 |
SSAA 1 MSAA 4 | 100 | 296 | 419 |
SSAA 0.25 MSAA 1 | 20 | 63 | 79 |
SSAA 2.25 MSAA 1 | 60 | 178 | 276 |
Where the number behind the "SSAA" is the resolution multiplier (pixel_ratio**2
), making it comparable to the number behind MSAA. The numbers in the table are in MB, measured using the NVidea panel, and with the baseline subtracted.
Some observations:
- MSAA x4 and SSAA x4 consume equal amount of memory. This might be expected, but I also read in a few places that GPUs store multisampled textures in an efficient manner. I suspect this efficiency only relates to performance (maximizing cache hits) and not so much the memory they occupy.
- Doubling the effective resolution by selecting MSAA/FSAA properties does not result in result in double the memory.
- Similarly, when increasing the window size, the increase in memory is much less than the increase in number of pixels.
Looks like some form of compression is applied both for MSAA and SSAA?
Notes
- It is possible to force running the fragment shader per-sample. That way we can remove one of the downsides of MSAA where it makes sense, and still benefit from e.g. much faster volume rendering.
- For the above to work, we'd need to expose the SampleRateShading capability in wgpu-native, and enable it in wgpu-py.
- If we take the relative positions of the samples into account (as opposed to simply averaging all samples of a pixel) we can produce a better end-result in the flush/resolve step.
- Though the result with the equivalent resolution SSAA looks slightly better.
Summary (so far)
I started looking into this with the hope of reducing the memory consumption of our (many) render targets. It looks like MSAA is not going to help here, though I also learned that the memory does not seem to scale linearly with the number of pixels.
MSAA does provide potential benefits in terms of performance, especially in demanding fragment shaders like raycasting. It does add some complexity, but that may well be worth it.