Skip to content

Conversation

SAKURA-CAT
Copy link
Member

This pull request enhances the ascend.py file by adding functionality to monitor NPU power usage. The changes include updates to the initialization method, a new method for retrieving power metrics, and integration of power data collection into the existing workflow.

Additions related to NPU power monitoring:

  • Initialization of power monitoring configurations:

    • Added self.power_key for generating unique keys for NPU power metrics.
    • Introduced power_config and self.per_power_config to store chart configurations for power usage.
    • Updated the initialization loop to clone power configurations for each NPU chip.
  • New method for power data retrieval:

    • Added get_chip_power method to fetch power usage data using the npu-smi command and format it into the required hardware info structure.
  • Integration of power data into the collection process:

    • Modified the collect method to include power usage data in the results.

@SAKURA-CAT SAKURA-CAT requested a review from Copilot June 3, 2025 07:23
@SAKURA-CAT SAKURA-CAT self-assigned this Jun 3, 2025
@SAKURA-CAT SAKURA-CAT added the 💪 enhancement New feature or request label Jun 3, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends the Ascend NPU hardware metadata collector to include real-time power usage metrics for each chip.

  • Initializes per-chip power chart configurations
  • Introduces get_chip_power to fetch and parse NPU power via npu-smi
  • Updates collect to include power readings alongside utilization and temperature
Comments suppressed due to low confidence (2)

swanlab/data/run/metadata/hardware/npu/ascend.py:166

  • For consistency with per_util_configs, per_hbm_configs, and per_temp_configs, rename per_power_config to per_power_configs.
self.per_power_config = {}

swanlab/data/run/metadata/hardware/npu/ascend.py:245

  • The new get_chip_power method and its integration into collect lack accompanying tests. Consider adding unit or integration tests to validate parsing logic and error handling.
    def get_chip_power(self, npu_id: str, chip_id: str) -> HardwareInfo:

@SAKURA-CAT SAKURA-CAT merged commit db0d87d into main Jun 5, 2025
5 checks passed
@SAKURA-CAT SAKURA-CAT deleted the feature/ascend-watt branch June 5, 2025 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants