Skip to content

Conversation

Askir
Copy link
Contributor

@Askir Askir commented May 12, 2025

Fixes #728

The magic here is that openai has a token estimator in front of their actual API this one simply counts utf-8 bytes and for each one calculates 0.25 tokens.

This is what the 300k limit per batch request limit applies to. The actual tokens don't matter. A request with 1 million tokens and 1.2 million bytes will go through just fine as the estimator thinks this is exactly 300k "tokens".
Likewise a request with 200 000 tokens but 1.5 million bytes will fail.

@Askir Askir had a problem deploying to internal-contributors May 12, 2025 12:50 — with GitHub Actions Error
@Askir Askir force-pushed the jascha/fix-openai-300k-token-limit branch from a736ac4 to 9567c68 Compare May 12, 2025 12:57
@Askir Askir had a problem deploying to internal-contributors May 12, 2025 12:57 — with GitHub Actions Error
@Askir Askir marked this pull request as ready for review May 12, 2025 13:00
@Askir Askir requested a review from a team as a code owner May 12, 2025 13:00
@Askir Askir temporarily deployed to internal-contributors May 12, 2025 13:00 — with GitHub Actions Inactive
@Askir Askir force-pushed the jascha/fix-openai-300k-token-limit branch from 7fda62f to 5759fc8 Compare May 12, 2025 14:30
@Askir Askir temporarily deployed to internal-contributors May 12, 2025 14:30 — with GitHub Actions Inactive
@Askir Askir force-pushed the jascha/fix-openai-300k-token-limit branch from 5759fc8 to a4ae689 Compare May 13, 2025 08:29
@Askir Askir temporarily deployed to internal-contributors May 13, 2025 08:29 — with GitHub Actions Inactive
@Askir Askir force-pushed the jascha/fix-openai-300k-token-limit branch from a4ae689 to 2293e50 Compare May 13, 2025 08:33
@Askir Askir temporarily deployed to internal-contributors May 13, 2025 08:33 — with GitHub Actions Inactive
Copy link
Member

@JamesGuthrie JamesGuthrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer it if we would not duplicate the implementation of batch_indices, and would instead make the existing batch_indices take an optional estimated_chunk_token_lengths parameter.

@Askir Askir force-pushed the jascha/fix-openai-300k-token-limit branch from 2293e50 to 564b980 Compare May 13, 2025 09:05
@Askir Askir temporarily deployed to internal-contributors May 13, 2025 09:05 — with GitHub Actions Inactive
@Askir Askir force-pushed the jascha/fix-openai-300k-token-limit branch from 564b980 to d15c3a6 Compare May 13, 2025 09:56
@Askir Askir temporarily deployed to internal-contributors May 13, 2025 09:56 — with GitHub Actions Inactive
@Askir Askir force-pushed the jascha/fix-openai-300k-token-limit branch from d15c3a6 to ab5a5c6 Compare May 13, 2025 09:58
@Askir Askir temporarily deployed to internal-contributors May 13, 2025 09:58 — with GitHub Actions Inactive
@Askir Askir force-pushed the jascha/fix-openai-300k-token-limit branch from ab5a5c6 to 1a47a33 Compare May 13, 2025 09:59
@Askir Askir temporarily deployed to internal-contributors May 13, 2025 10:00 — with GitHub Actions Inactive
@Askir Askir force-pushed the jascha/fix-openai-300k-token-limit branch from 1a47a33 to edc03c6 Compare May 13, 2025 10:07
@Askir Askir temporarily deployed to internal-contributors May 13, 2025 10:07 — with GitHub Actions Inactive
@Askir Askir force-pushed the jascha/fix-openai-300k-token-limit branch from edc03c6 to 09a1437 Compare May 13, 2025 10:11
@Askir Askir temporarily deployed to internal-contributors May 13, 2025 10:11 — with GitHub Actions Inactive
@Askir Askir temporarily deployed to internal-contributors May 14, 2025 08:37 — with GitHub Actions Inactive
Comment on lines +153 to +164
def _estimate_token_length(self, document: str) -> float:
"""
Estimates token count based on UTF-8 byte length.
"""

total_estimated_tokens = 0
for char in document:
byte_length = len(char.encode("utf-8"))
total_estimated_tokens += byte_length * 0.25 # 0.25 tokens per byte

return total_estimated_tokens

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that we do this, and then remove the type changes that this propagates:

Suggested change
def _estimate_token_length(self, document: str) -> float:
"""
Estimates token count based on UTF-8 byte length.
"""
total_estimated_tokens = 0
for char in document:
byte_length = len(char.encode("utf-8"))
total_estimated_tokens += byte_length * 0.25 # 0.25 tokens per byte
return total_estimated_tokens
def _estimate_token_length(self, document: str) -> int:
"""
Estimates token count based on UTF-8 byte length.
"""
total_estimated_tokens = 0
for char in document:
byte_length = len(char.encode("utf-8"))
total_estimated_tokens += byte_length * 0.25 # 0.25 tokens per byte
return ceil(total_estimated_tokens)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that works because that would mean it rounds each individual documents tokens not the whole batch. The api however counts the bytes of everything first and then rounds.

E.g. if you send 100 batches of just "a" it would be estimated with 100 tokens this way but the api only assigns 25 (which makes no sense but well that's how it works)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... interesting. I didn't realise that on first inspection.

@Askir Askir requested a review from JamesGuthrie May 14, 2025 10:52
@Askir Askir merged commit 7fbd781 into main May 14, 2025
14 checks passed
@Askir Askir deleted the jascha/fix-openai-300k-token-limit branch May 14, 2025 11:50
@Michnic120
Copy link

@Askir Great catch! How did you find out about OpenAI's token estimator?

@Askir
Copy link
Contributor Author

Askir commented Jun 17, 2025

@Michnic120 I just spent a full day trying a bunch of tokenized inputs to figure out what works and what doesn't. If you break the limit, the error message contains the token count that their api is "calculating" so eventually I figured out how it worked simply through trial and error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: OpenAI - Reaching max tokens per request. Count discrepancy between local count tokens and the count given by OpenAI API
3 participants