Skip to content

Conversation

albertvillanova
Copy link
Member

Fix incorrect token counting in streaming TransformersModel.

This issue was introduced in v1.19.0:

The input tokens were being repeatedly included in each yielded ChatMessageStreamDelta, causing them to be counted multiple times when summed externally. This resulted in inflated token usage reporting.

Solution:

  • Modified the token counting logic to only include input tokens in the first yielded token

Fix #1488.

Copy link
Collaborator

@aymeric-roucher aymeric-roucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @albertvillanova !

@albertvillanova albertvillanova merged commit 27afcc0 into huggingface:main Jun 30, 2025
4 of 5 checks passed
@albertvillanova albertvillanova deleted the fix-1488 branch June 30, 2025 15:29
@peabody124
Copy link

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Reported token utilization changes by order of magnitude with engine streaming/not streaming
3 participants