Skip to content

Conversation

yao-matrix
Copy link
Contributor

@yao-matrix yao-matrix commented Jun 5, 2025

@kashif , pls help review and comment, thx very much.

device-agnostic to cover xpu

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

```python
training_args = DPOConfig(..., optimize_cuda_cache=True)
training_args = DPOConfig(..., optimize_device_cache=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no optimize_cuda_cache anymore, so update the doc here

@@ -82,7 +82,7 @@ class ScriptArguments:
batch_size=script_args.batch_size,
mini_batch_size=script_args.mini_batch_size,
gradient_accumulation_steps=script_args.gradient_accumulation_steps,
optimize_cuda_cache=True,
optimize_device_cache=True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

torch.cuda.empty_cache()
elif torch_device == "xpu":
torch.xpu.empty_cache()
backend_empty_cache(torch_device)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the device-agnostic utility from transformers.testing_utils rather than if-else

@unittest.skipIf(
get_device_properties()[0] == "cuda" and get_device_properties()[1] < 8,
"Skipping because bf16 not supported on CUDA GPU with capability < 8.0",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add skipIf per the comments and remove condition-less skip

if is_torch_xpu_available():
return f"xpu:{state.local_process_index}"
if torch.cuda.is_available() or is_torch_xpu_available():
return state.local_process_index
elif is_torch_npu_available():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this WA anymore, xpu now support integer device index

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! thanks! Just one comment

kashif and others added 2 commits June 9, 2025 14:30
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
@kashif
Copy link
Collaborator

kashif commented Jun 9, 2025

@qgallouedec is the test failing due to the CI issue?

@qgallouedec
Copy link
Member

Yes, fixing it in #3551

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec qgallouedec changed the title unify autocast behavior to torch.autocast and make it cover XPU ℹ️ Unify autocast behavior to torch.autocast and make it cover XPU Jun 9, 2025
@qgallouedec
Copy link
Member

Let's wait for #3553 to be merged

@kashif kashif merged commit 1314aac into huggingface:main Jun 10, 2025
10 checks passed
@yao-matrix yao-matrix deleted the xpu branch June 10, 2025 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants