-
Notifications
You must be signed in to change notification settings - Fork 278
fix: openai per batch limit #730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a736ac4
to
9567c68
Compare
7fda62f
to
5759fc8
Compare
5759fc8
to
a4ae689
Compare
a4ae689
to
2293e50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer it if we would not duplicate the implementation of batch_indices
, and would instead make the existing batch_indices
take an optional estimated_chunk_token_lengths
parameter.
2293e50
to
564b980
Compare
564b980
to
d15c3a6
Compare
d15c3a6
to
ab5a5c6
Compare
ab5a5c6
to
1a47a33
Compare
1a47a33
to
edc03c6
Compare
edc03c6
to
09a1437
Compare
def _estimate_token_length(self, document: str) -> float: | ||
""" | ||
Estimates token count based on UTF-8 byte length. | ||
""" | ||
|
||
total_estimated_tokens = 0 | ||
for char in document: | ||
byte_length = len(char.encode("utf-8")) | ||
total_estimated_tokens += byte_length * 0.25 # 0.25 tokens per byte | ||
|
||
return total_estimated_tokens | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that we do this, and then remove the type changes that this propagates:
def _estimate_token_length(self, document: str) -> float: | |
""" | |
Estimates token count based on UTF-8 byte length. | |
""" | |
total_estimated_tokens = 0 | |
for char in document: | |
byte_length = len(char.encode("utf-8")) | |
total_estimated_tokens += byte_length * 0.25 # 0.25 tokens per byte | |
return total_estimated_tokens | |
def _estimate_token_length(self, document: str) -> int: | |
""" | |
Estimates token count based on UTF-8 byte length. | |
""" | |
total_estimated_tokens = 0 | |
for char in document: | |
byte_length = len(char.encode("utf-8")) | |
total_estimated_tokens += byte_length * 0.25 # 0.25 tokens per byte | |
return ceil(total_estimated_tokens) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that works because that would mean it rounds each individual documents tokens not the whole batch. The api however counts the bytes of everything first and then rounds.
E.g. if you send 100 batches of just "a" it would be estimated with 100 tokens this way but the api only assigns 25 (which makes no sense but well that's how it works)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... interesting. I didn't realise that on first inspection.
@Askir Great catch! How did you find out about OpenAI's token estimator? |
@Michnic120 I just spent a full day trying a bunch of tokenized inputs to figure out what works and what doesn't. If you break the limit, the error message contains the token count that their api is "calculating" so eventually I figured out how it worked simply through trial and error. |
Fixes #728
The magic here is that openai has a token estimator in front of their actual API this one simply counts utf-8 bytes and for each one calculates 0.25 tokens.
This is what the 300k limit per batch request limit applies to. The actual tokens don't matter. A request with 1 million tokens and 1.2 million bytes will go through just fine as the estimator thinks this is exactly 300k "tokens".
Likewise a request with 200 000 tokens but 1.5 million bytes will fail.