Skip to content

feature: Support AMDGPU Data Collection #1641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 20, 2024
Merged

feature: Support AMDGPU Data Collection #1641

merged 3 commits into from
Dec 20, 2024

Conversation

yretenai
Copy link
Contributor

@yretenai yretenai commented Dec 2, 2024

Description

Adds GPU metrics gathering via amdgpu_top's libamdgpu_top crate on Linux.

image

Some notes: the library queries /proc/{pid}/fdinfo, which can probably be parsed without amdgpu_top's libraries. Intel Arc apparently also uses this fdinfo but I cannot confirm.

Testing

If relevant, please state how this was tested. All changes must be tested to work:

If this is a code change, please also indicate which platforms were tested:

  • Windows
  • macOS
  • Linux

Checklist

If relevant, ensure the following have been met:

  • Areas your change affects have been linted using rustfmt (cargo fmt)
  • The change has been tested and doesn't appear to cause any unintended breakage
  • Documentation has been added/updated if needed (README.md, help menu, doc pages, etc.)
  • The pull request passes the provided CI pipeline
  • There are no merge conflicts
  • If relevant, new tests were added (don't worry too much about coverage)

Copy link

codecov bot commented Dec 2, 2024

Codecov Report

Attention: Patch coverage is 6.62983% with 338 lines in your changes missing coverage. Please review.

Project coverage is 41.37%. Comparing base (1fe17dd) to head (95eab0f).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/data_collection/amd.rs 3.33% 319 Missing ⚠️
src/data_collection.rs 40.62% 19 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1641      +/-   ##
==========================================
- Coverage   42.03%   41.37%   -0.67%     
==========================================
  Files         116      118       +2     
  Lines       17625    17926     +301     
==========================================
+ Hits         7409     7417       +8     
- Misses      10216    10509     +293     
Flag Coverage Δ
macos-14 37.38% <0.00%> (-0.08%) ⬇️
ubuntu-latest 43.09% <6.62%> (-0.73%) ⬇️
windows-2019 37.31% <0.00%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yretenai
Copy link
Contributor Author

yretenai commented Dec 2, 2024

CI failing due to lacking libdrm/libdrm_amdgpu libraries. What's the best way forward here?

@ClementTsang ClementTsang self-assigned this Dec 2, 2024
@ClementTsang
Copy link
Owner

Can the libraries be installed?

@yretenai
Copy link
Contributor Author

yretenai commented Dec 2, 2024

libdrm should exist in most linux distros via their package managers, i'm atm investigating a dependency-free solution by parsing /proc/*/fdinfo/*

@yretenai
Copy link
Contributor Author

yretenai commented Dec 4, 2024

Most recent commit parses AMD GPU metrics via procfs (for per-process utilization and video memory usage) and sysfs (for overall AMDGPU memory usage and temperature sensors) and as such doesn't rely on any libraries.

However the code is significantly more complex.

@ClementTsang ClementTsang changed the title Support AMDGPU Data Collection feature: Support AMDGPU Data Collection Dec 6, 2024
Copy link
Owner

@ClementTsang ClementTsang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments for now, mostly looks good though.

}

// needs previous state for usage calculation
static PROC_DATA: LazyLock<Mutex<HashMap<PathBuf, HashMap<u32, AMDGPUProc>>>> =
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: This doesn't need to be changed for this PR, but a mutex around this seems a little overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the only way I could get a mutable reference to PROC_DATA without making the static itself mutable, and thus requiring unsafe code to access.

Comment on lines +117 to +119
// get vram memory info from sysfs
let vram_total_path = device_path.join("mem_info_vram_total");
let vram_used_path = device_path.join("mem_info_vram_used");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you happen to know whether any of these checks in this file wake up the GPU if it is currently sleeping? We had an issue with temperature checks in https://github.com/ClementTsang/bottom/blob/main/src/data_collection/temperature/linux.rs#L226 where we were waking up devices (mainly GPUs) when checking their temperature, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not, but I can ask around.

})
}

pub fn get_amd_temp(device_path: &Path) -> Option<Vec<AMDGPUTemperature>> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides having better names, no. They end up symlinking to the same hwmon endpoint

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, guess it's fine to leave it in for now but I might integrate this into that code path in a later PR.

@yretenai
Copy link
Contributor Author

yretenai commented Dec 6, 2024

Left a few comments for now, mostly looks good though.

I will make the necessary changes tomorrow.

@yretenai
Copy link
Contributor Author

yretenai commented Dec 7, 2024

Force pushes were rewording commit messages.

yretenai and others added 3 commits December 7, 2024 18:01
Co-authored-by: lvxnull2 <184518908+lvxnull2@users.noreply.github.com>
gpu: fix clippy issues

Co-authored-by: lvxnull2 <184518908+lvxnull2@users.noreply.github.com>
…ead of current memory usage

gpu: requested syntax changes

Co-authored-by: lvxnull2 <184518908+lvxnull2@users.noreply.github.com>
@yretenai
Copy link
Contributor Author

yretenai commented Dec 7, 2024

I accidentally reset the signature of the 4th commit from HEAD, which I just fixed by resetting the entire branch, apologies! History should be preserved now.

@jamartin9 jamartin9 mentioned this pull request Dec 10, 2024
@ClementTsang
Copy link
Owner

Sorry for the delay, was on vacation - will take a look at this again in a sec.

Copy link
Owner

@ClementTsang ClementTsang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for doing this!

@ClementTsang
Copy link
Owner

@all-contributors please add @yretenai for code.

Copy link
Contributor

@ClementTsang

I've put up a pull request to add @yretenai! 🎉

@ClementTsang ClementTsang merged commit 479276b into ClementTsang:main Dec 20, 2024
37 checks passed
ClementTsang added a commit that referenced this pull request Dec 20, 2024
ClementTsang added a commit that referenced this pull request Dec 20, 2024
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Aug 6, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ClementTsang/bottom](https://github.com/ClementTsang/bottom) | minor | `0.10.2` -> `0.11.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>ClementTsang/bottom (ClementTsang/bottom)</summary>

### [`v0.11.0`](https://github.com/ClementTsang/bottom/blob/HEAD/CHANGELOG.md#0110---2025-08-05)

[Compare Source](ClementTsang/bottom@0.10.2...0.11.0)

##### Features

- [#&#8203;1625](ClementTsang/bottom#1625): Add the ability to configure the disk widget's table columns.
- [#&#8203;1641](ClementTsang/bottom#1641) + [#&#8203;1692](ClementTsang/bottom#1692): Support AMD GPU data collection on Linux.
- [#&#8203;1642](ClementTsang/bottom#1642): Support changing the widget borders.
- [#&#8203;1717](ClementTsang/bottom#1717): Support delete key (fn + delete on macOS) to kill processes.
- [#&#8203;1306](ClementTsang/bottom#1306): Support using left/right key to collapse/expand process trees respectively.
- [#&#8203;1767](ClementTsang/bottom#1767): Add a virtual memory column for processes.
- [#&#8203;1770](ClementTsang/bottom#1770) (originally [#&#8203;1627](ClementTsang/bottom#1627)): Add option to have process tree entries be collapsed by default.

##### Bug Fixes

- [#&#8203;1551](ClementTsang/bottom#1551): Fix missing parent section names in default config.
- [#&#8203;1552](ClementTsang/bottom#1552): Fix typo in default config.
- [#&#8203;1565](ClementTsang/bottom#1565): Fix issue where CPU usage in basic mode looks weird if core count isn't divisible by four.
- [#&#8203;1578](ClementTsang/bottom#1578): Fix missing selected text background colour in `default-light` theme.
- [#&#8203;1593](ClementTsang/bottom#1593): Fix using `"none"` for chart legend position in configs.
- [#&#8203;1594](ClementTsang/bottom#1594): Fix incorrect default config definitions for chart legends.
- [#&#8203;1596](ClementTsang/bottom#1596): Fix support for nilfs2 file system.
- [#&#8203;1660](ClementTsang/bottom#1660): Fix properly cleaning up the terminal if the program is terminated due to an `Err` bubbling to the top.
- [#&#8203;1663](ClementTsang/bottom#1663): Fix network graphs using log scaling having broken lines when a point was 0.
- [#&#8203;1667](ClementTsang/bottom#1667): Fix for ARC/SWAP not being hidden in basic mode after refactor.
- [#&#8203;1683](ClementTsang/bottom#1683): Fix graph lines potentially showing up behind legends.
- [#&#8203;1701](ClementTsang/bottom#1701): Fix process kill dialog occasionally causing panics.
- [#&#8203;1755](ClementTsang/bottom#1755): Fix missing stats/incorrect mount name for certain entries in the disk widget.
- [#&#8203;1759](ClementTsang/bottom#1759): Fix increment for data tables if the change is greater than the number of entries left.

##### Changes

- [#&#8203;1559](ClementTsang/bottom#1559): Rename `--enable_gpu` to `--disable_gpu`, and make GPU features enabled by default.
- [#&#8203;1570](ClementTsang/bottom#1570): Consider `$XDG_CONFIG_HOME` on macOS when looking for a default config path in a backwards-compatible fashion.
- [#&#8203;1686](ClementTsang/bottom#1686): Allow hyphenated arguments to work as well (e.g. `--autohide-time`).
- [#&#8203;1701](ClementTsang/bottom#1701): Redesign process kill dialog.
- [#&#8203;1706](ClementTsang/bottom#1706): Disable mouse capture when `disable_click` is set.
- [#&#8203;1769](ClementTsang/bottom#1769): Change how we calculate swap usage in Windows.

##### Other

- [#&#8203;1655](ClementTsang/bottom#1655): Better handle NVIDIA GPUs on Linux with only libnvidia-ml.so.1.
- [#&#8203;1658](ClementTsang/bottom#1658): Make it possible to override completion/manpage generation output directory via env.
- [#&#8203;1663](ClementTsang/bottom#1663): Rework how data is stored internally, reducing memory usage a bit.
- [#&#8203;1749](ClementTsang/bottom#1749): Fix invalid desktop file values.

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS41My4xIiwidXBkYXRlZEluVmVyIjoiNDEuNTMuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants