fix: openai per batch limit #730

Askir · 2025-05-12T12:50:21Z

Fixes #728

The magic here is that openai has a token estimator in front of their actual API this one simply counts utf-8 bytes and for each one calculates 0.25 tokens.

This is what the 300k limit per batch request limit applies to. The actual tokens don't matter. A request with 1 million tokens and 1.2 million bytes will go through just fine as the estimator thinks this is exactly 300k "tokens".
Likewise a request with 200 000 tokens but 1.5 million bytes will fail.

JamesGuthrie

I would prefer it if we would not duplicate the implementation of batch_indices, and would instead make the existing batch_indices take an optional estimated_chunk_token_lengths parameter.

JamesGuthrie · 2025-05-14T09:06:43Z

projects/pgai/pgai/vectorizer/embedders/openai.py

+    def _estimate_token_length(self, document: str) -> float:
+        """
+        Estimates token count based on UTF-8 byte length.
+        """
+
+        total_estimated_tokens = 0
+        for char in document:
+            byte_length = len(char.encode("utf-8"))
+            total_estimated_tokens += byte_length * 0.25  # 0.25 tokens per byte
+
+        return total_estimated_tokens
+


I would suggest that we do this, and then remove the type changes that this propagates:

Suggested change

def _estimate_token_length(self, document: str) -> float:

"""

Estimates token count based on UTF-8 byte length.

"""

total_estimated_tokens = 0

for char in document:

byte_length = len(char.encode("utf-8"))

total_estimated_tokens += byte_length * 0.25 # 0.25 tokens per byte

return total_estimated_tokens

def _estimate_token_length(self, document: str) -> int:

"""

Estimates token count based on UTF-8 byte length.

"""

total_estimated_tokens = 0

for char in document:

byte_length = len(char.encode("utf-8"))

total_estimated_tokens += byte_length * 0.25 # 0.25 tokens per byte

return ceil(total_estimated_tokens)

I don't think that works because that would mean it rounds each individual documents tokens not the whole batch. The api however counts the bytes of everything first and then rounds.

E.g. if you send 100 batches of just "a" it would be estimated with 100 tokens this way but the api only assigns 25 (which makes no sense but well that's how it works)

Hmmm... interesting. I didn't realise that on first inspection.

Michnic120 · 2025-06-12T14:19:27Z

@Askir Great catch! How did you find out about OpenAI's token estimator?

Askir · 2025-06-17T15:11:14Z

@Michnic120 I just spent a full day trying a bunch of tokenized inputs to figure out what works and what doesn't. If you break the limit, the error message contains the token count that their api is "calculating" so eventually I figured out how it worked simply through trial and error.

Askir had a problem deploying to internal-contributors May 12, 2025 12:50 — with GitHub Actions Error

fix: openai per batch limit

9567c68

Askir force-pushed the jascha/fix-openai-300k-token-limit branch from a736ac4 to 9567c68 Compare May 12, 2025 12:57

Askir had a problem deploying to internal-contributors May 12, 2025 12:57 — with GitHub Actions Error

Askir marked this pull request as ready for review May 12, 2025 13:00

Askir requested a review from a team as a code owner May 12, 2025 13:00

Askir temporarily deployed to internal-contributors May 12, 2025 13:00 — with GitHub Actions Inactive

Askir force-pushed the jascha/fix-openai-300k-token-limit branch from 7fda62f to 5759fc8 Compare May 12, 2025 14:30

Askir temporarily deployed to internal-contributors May 12, 2025 14:30 — with GitHub Actions Inactive

Askir force-pushed the jascha/fix-openai-300k-token-limit branch from 5759fc8 to a4ae689 Compare May 13, 2025 08:29

Askir temporarily deployed to internal-contributors May 13, 2025 08:29 — with GitHub Actions Inactive

Askir force-pushed the jascha/fix-openai-300k-token-limit branch from a4ae689 to 2293e50 Compare May 13, 2025 08:33

Askir temporarily deployed to internal-contributors May 13, 2025 08:33 — with GitHub Actions Inactive

JamesGuthrie reviewed May 13, 2025

View reviewed changes

Askir force-pushed the jascha/fix-openai-300k-token-limit branch from 2293e50 to 564b980 Compare May 13, 2025 09:05

Askir temporarily deployed to internal-contributors May 13, 2025 09:05 — with GitHub Actions Inactive

Askir force-pushed the jascha/fix-openai-300k-token-limit branch from 564b980 to d15c3a6 Compare May 13, 2025 09:56

Askir temporarily deployed to internal-contributors May 13, 2025 09:56 — with GitHub Actions Inactive

Askir force-pushed the jascha/fix-openai-300k-token-limit branch from d15c3a6 to ab5a5c6 Compare May 13, 2025 09:58

Askir temporarily deployed to internal-contributors May 13, 2025 09:58 — with GitHub Actions Inactive

Askir force-pushed the jascha/fix-openai-300k-token-limit branch from ab5a5c6 to 1a47a33 Compare May 13, 2025 09:59

Askir temporarily deployed to internal-contributors May 13, 2025 10:00 — with GitHub Actions Inactive

Askir force-pushed the jascha/fix-openai-300k-token-limit branch from 1a47a33 to edc03c6 Compare May 13, 2025 10:07

Askir temporarily deployed to internal-contributors May 13, 2025 10:07 — with GitHub Actions Inactive

chore: trigger ci

09a1437

Askir force-pushed the jascha/fix-openai-300k-token-limit branch from edc03c6 to 09a1437 Compare May 13, 2025 10:11

Askir temporarily deployed to internal-contributors May 13, 2025 10:11 — with GitHub Actions Inactive

chore: leave a comment on why using estimated tokens

c8a463c

Askir temporarily deployed to internal-contributors May 14, 2025 08:37 — with GitHub Actions Inactive

JamesGuthrie reviewed May 14, 2025

View reviewed changes

Askir requested a review from JamesGuthrie May 14, 2025 10:52

JamesGuthrie approved these changes May 14, 2025

View reviewed changes

Askir merged commit 7fbd781 into main May 14, 2025
14 checks passed

Askir deleted the jascha/fix-openai-300k-token-limit branch May 14, 2025 11:50

github-actions bot mentioned this pull request May 14, 2025

chore(main): release pgai 0.10.3 #716

Merged

alejandrodnm mentioned this pull request May 15, 2025

[Bug]: Trying to run the vectorizer on a large number of new documents results in "Requested 629204 tokens, max 600000 tokens per request" from openai #481

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: openai per batch limit #730

fix: openai per batch limit #730

Uh oh!

Askir commented May 12, 2025 •

edited

Loading

Uh oh!

JamesGuthrie left a comment

Uh oh!

JamesGuthrie May 14, 2025

Uh oh!

Askir May 14, 2025

Uh oh!

JamesGuthrie May 14, 2025

Uh oh!

Uh oh!

Michnic120 commented Jun 12, 2025

Uh oh!

Askir commented Jun 17, 2025

Uh oh!

Uh oh!

fix: openai per batch limit #730

fix: openai per batch limit #730

Uh oh!

Conversation

Askir commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamesGuthrie left a comment

Choose a reason for hiding this comment

Uh oh!

JamesGuthrie May 14, 2025

Choose a reason for hiding this comment

Uh oh!

Askir May 14, 2025

Choose a reason for hiding this comment

Uh oh!

JamesGuthrie May 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Michnic120 commented Jun 12, 2025

Uh oh!

Askir commented Jun 17, 2025

Uh oh!

Uh oh!

Askir commented May 12, 2025 •

edited

Loading