Use separate encoding and capture devices to enable parallel encoding and capture #668

cgutman · 2023-01-01T23:29:52Z

Description

This PR modifies hwdevice_t to use a separate ID3D11Device and ID3D11DeviceContext to allow encoding to happen in parallel with capture without violating DXGI synchronization requirements. Each img_t has a texture created on the capture-side by display_vram_t::complete_img() which is then opened via handle on the encoding side by hwdevice_t::share_img().

The only concurrency bottleneck now is display_vram_t::snapshot() and hwdevice_t::convert() where they use a IDXGIKeyedMutex to synchronize access to the same img texture. Since PARALLEL_ENCODING creates a pool of images, this is unlikely to a problem in practice.

This PR temporarily includes the changes in #667 since they are necessary for this to work. Once #667 is merged, I'll rebase this to remove that commit.

Screenshot

Issues Fixed or Closed

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Dependency update (updates to dependencies)
Documentation update (changes to documentation)
Repository update (changes to repository files, e.g. .github/...)

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have added or updated the in code docstring/documentation-blocks for new or existing methods/components

Branch Updates

LizardByte requires that branches be up-to-date before merging. This means that after any PR is merged, this branch
must be updated before it can be merged. You must also
Allow edits from maintainers.

I want maintainers to keep my branch updated

psyke83 · 2023-01-02T00:36:52Z

Before commenting on performance, the stuttering that was present in #660 and black frame issues in the the older version of #667 are no longer present.

Testing with 1440p host resolution, capture 1440p/60 @ 90MBps. GravityMark's real performance stays in the range of 80-130fps, average ~110fps (never any major dips).

Vulkan: average capture is ~55fps with two very noticeable dips down to 35fps (asteroid field close-ups).
D3D12: average capture is 45-50fps.
D3D11: average capture is solid 60fps (very small dips on scene change)

Further testing with AMD's built-in Chill framerate limiter:

Vulkan via Chill @ 60fps min/max: capture rate runs at a solid 60fps with no noticeable dips.
Vulkan via Chill @ 80fps min/max: capture rate runs at a solid 60fps, perhaps or or two minor dips to 55fps.
D3D11 via Chill @ 60fps min/max: capture rate runs at ~50fps with frequent dips [N.B. this also happens on latest nightly, so it's not an indication of something wrong with this PR]
D3D11 via Chill @ 80fps min/max: capture rate runs at solid 60fps.
D3D12: n/a (Chill cannot limit framerate in GravityMark with this API)

In summary, it seems 1) uncapped rates may be causing GPU contention that results in the encoder throttling in Vulkan and D3D12 (but not D3D11), and 2) the D3D11 capped @ 60fps tests result result seems quite counter-intuitive (but the poor performance in that case is unrelated to this PR).

I need to test some more with other non-Vulkan applications, but so far it looks very promising and is definitely an improvement compared to nightly.

psyke83 · 2023-01-02T01:35:37Z

Updated PR (with call to SetGPUThreadPriority() for second device) seems to perform more or less the same as before.

I also tested some 4K/60 host+client capture. Current nightly will slightly miss the target, hitting about 57fps in Cyberpunk 2077 or even when browsing https://www.vsynctester.com/, but the PR improves the capture rate to reach the 60fps target. Will continue testing and let you know if I encounter any issues.

Edit: I also tried pushing GravityMark a little further. At 4K on both client and host, Vulkan capture runs at mostly 60fps with just a few dips to 30 fps if the host is capped to 60FPS. The worst point is the end of the demo; the host drops to ~55fps and the capture rate is halved to ~27fps. If uncapped, host performance ranges from 55-70fps, and capture performance is worse than 1440P (~25fps average). So for AMD, GPU load definitely influences capture performance.

…g images

cgutman · 2023-01-02T20:49:09Z

This PR is ready for merge.

cgutman force-pushed the texture_sharing branch from a22e0d5 to 0bb95c1 Compare January 2, 2023 01:10

cgutman force-pushed the texture_sharing branch from 0bb95c1 to 670e7f6 Compare January 2, 2023 07:11

cgutman added 2 commits January 2, 2023 11:49

Use separate D3D devices to allow parallel encoding and capture

4df4378

Fix image leak if a parallel encoder quit without encoding all pendin…

adbfd09

…g images

cgutman force-pushed the texture_sharing branch from 670e7f6 to adbfd09 Compare January 2, 2023 17:49

cgutman marked this pull request as ready for review January 2, 2023 20:39

ReenigneArcher merged commit 0439d7a into LizardByte:nightly Jan 2, 2023

cgutman mentioned this pull request Jan 18, 2023

Fix streaming to multiple clients from hardware encoder on Windows #798

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use separate encoding and capture devices to enable parallel encoding and capture #668

Use separate encoding and capture devices to enable parallel encoding and capture #668

Uh oh!

cgutman commented Jan 1, 2023 •

edited

Loading

Uh oh!

psyke83 commented Jan 2, 2023 •

edited

Loading

Uh oh!

psyke83 commented Jan 2, 2023 •

edited

Loading

Uh oh!

cgutman commented Jan 2, 2023

Uh oh!

Uh oh!

Uh oh!

Use separate encoding and capture devices to enable parallel encoding and capture #668

Use separate encoding and capture devices to enable parallel encoding and capture #668

Uh oh!

Conversation

cgutman commented Jan 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Screenshot

Issues Fixed or Closed

Type of Change

Checklist

Branch Updates

Uh oh!

psyke83 commented Jan 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psyke83 commented Jan 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cgutman commented Jan 2, 2023

Uh oh!

Uh oh!

cgutman commented Jan 1, 2023 •

edited

Loading

psyke83 commented Jan 2, 2023 •

edited

Loading

psyke83 commented Jan 2, 2023 •

edited

Loading