Fix incorrect token counting in streaming TransformersModel #1503

albertvillanova · 2025-06-30T12:50:53Z

Fix incorrect token counting in streaming TransformersModel.

This issue was introduced in v1.19.0:

Transfer aggregation of streaming events off the Model class #1449

The input tokens were being repeatedly included in each yielded ChatMessageStreamDelta, causing them to be counted multiple times when summed externally. This resulted in inflated token usage reporting.

Solution:

Modified the token counting logic to only include input tokens in the first yielded token

Fix #1488.

aymeric-roucher

Thank you @albertvillanova !

peabody124 · 2025-07-01T17:35:20Z

Thanks!

Fix incorrect token counting in streaming TransformersModel

5bd3df9

aymeric-roucher approved these changes Jun 30, 2025

View reviewed changes

albertvillanova merged commit 27afcc0 into huggingface:main Jun 30, 2025
4 of 5 checks passed

albertvillanova deleted the fix-1488 branch June 30, 2025 15:29

albertvillanova mentioned this pull request Jul 15, 2025

[BUG] Confused about the token usage #1555

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix incorrect token counting in streaming TransformersModel #1503

Fix incorrect token counting in streaming TransformersModel #1503

Uh oh!

albertvillanova commented Jun 30, 2025

Uh oh!

aymeric-roucher left a comment

Uh oh!

Uh oh!

peabody124 commented Jul 1, 2025

Uh oh!

Uh oh!

Fix incorrect token counting in streaming TransformersModel #1503

Fix incorrect token counting in streaming TransformersModel #1503

Uh oh!

Conversation

albertvillanova commented Jun 30, 2025

Uh oh!

aymeric-roucher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

peabody124 commented Jul 1, 2025

Uh oh!

Uh oh!