Skip to content

Conversation

cgutman
Copy link
Collaborator

@cgutman cgutman commented Jan 1, 2023

Description

This PR modifies hwdevice_t to use a separate ID3D11Device and ID3D11DeviceContext to allow encoding to happen in parallel with capture without violating DXGI synchronization requirements. Each img_t has a texture created on the capture-side by display_vram_t::complete_img() which is then opened via handle on the encoding side by hwdevice_t::share_img().

The only concurrency bottleneck now is display_vram_t::snapshot() and hwdevice_t::convert() where they use a IDXGIKeyedMutex to synchronize access to the same img texture. Since PARALLEL_ENCODING creates a pool of images, this is unlikely to a problem in practice.

This PR temporarily includes the changes in #667 since they are necessary for this to work. Once #667 is merged, I'll rebase this to remove that commit.

Screenshot

Issues Fixed or Closed

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Dependency update (updates to dependencies)
  • Documentation update (changes to documentation)
  • Repository update (changes to repository files, e.g. .github/...)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated the in code docstring/documentation-blocks for new or existing methods/components

Branch Updates

LizardByte requires that branches be up-to-date before merging. This means that after any PR is merged, this branch
must be updated before it can be merged. You must also
Allow edits from maintainers.

  • I want maintainers to keep my branch updated

@psyke83
Copy link
Contributor

psyke83 commented Jan 2, 2023

Before commenting on performance, the stuttering that was present in #660 and black frame issues in the the older version of #667 are no longer present.

Testing with 1440p host resolution, capture 1440p/60 @ 90MBps. GravityMark's real performance stays in the range of 80-130fps, average ~110fps (never any major dips).

  • Vulkan: average capture is ~55fps with two very noticeable dips down to 35fps (asteroid field close-ups).
  • D3D12: average capture is 45-50fps.
  • D3D11: average capture is solid 60fps (very small dips on scene change)

Further testing with AMD's built-in Chill framerate limiter:

  • Vulkan via Chill @ 60fps min/max: capture rate runs at a solid 60fps with no noticeable dips.
  • Vulkan via Chill @ 80fps min/max: capture rate runs at a solid 60fps, perhaps or or two minor dips to 55fps.
  • D3D11 via Chill @ 60fps min/max: capture rate runs at ~50fps with frequent dips [N.B. this also happens on latest nightly, so it's not an indication of something wrong with this PR]
  • D3D11 via Chill @ 80fps min/max: capture rate runs at solid 60fps.
  • D3D12: n/a (Chill cannot limit framerate in GravityMark with this API)

In summary, it seems 1) uncapped rates may be causing GPU contention that results in the encoder throttling in Vulkan and D3D12 (but not D3D11), and 2) the D3D11 capped @ 60fps tests result result seems quite counter-intuitive (but the poor performance in that case is unrelated to this PR).

I need to test some more with other non-Vulkan applications, but so far it looks very promising and is definitely an improvement compared to nightly.

@psyke83
Copy link
Contributor

psyke83 commented Jan 2, 2023

Updated PR (with call to SetGPUThreadPriority() for second device) seems to perform more or less the same as before.

I also tested some 4K/60 host+client capture. Current nightly will slightly miss the target, hitting about 57fps in Cyberpunk 2077 or even when browsing https://www.vsynctester.com/, but the PR improves the capture rate to reach the 60fps target. Will continue testing and let you know if I encounter any issues.

Edit: I also tried pushing GravityMark a little further. At 4K on both client and host, Vulkan capture runs at mostly 60fps with just a few dips to 30 fps if the host is capped to 60FPS. The worst point is the end of the demo; the host drops to ~55fps and the capture rate is halved to ~27fps. If uncapped, host performance ranges from 55-70fps, and capture performance is worse than 1440P (~25fps average). So for AMD, GPU load definitely influences capture performance.

@cgutman cgutman marked this pull request as ready for review January 2, 2023 20:39
@cgutman
Copy link
Collaborator Author

cgutman commented Jan 2, 2023

This PR is ready for merge.

@ReenigneArcher ReenigneArcher merged commit 0439d7a into LizardByte:nightly Jan 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants