fix: Change Vscode LM token counts to use approx counting method #5280

arafatkatze · 2025-07-31T23:34:33Z

Solves GitHub issue #4584, reflecting the tiktoken-based solution implemented in the VSCode LM provider:

Description

Problem:
The VSCode Language Model API's countTokens() method is unreliable and often returns incorrect token counts for non-Claude models. For example, it frequently returns a constant value (like 4) regardless of the actual text length. This leads to inaccurate token usage reporting, potential context management issues, and incorrect cost calculations.

Solution:
This PR replaces the unreliable countTokens() method in the VSCode LM provider with a robust tiktoken-based token counter. The new implementation uses the tiktoken library's cl100k_base encoding to provide accurate token estimates across all model types. If tiktoken fails, it gracefully falls back to a character-based estimation (4:1 ratio for Claude models and 3:1 for others). This ensures consistent and reliable token counting behavior.

Changes:

Added a new TokenCounter utility in src/utils/tokenCounter.ts:
- Uses tiktoken's cl100k_base encoding for accurate token counting.
- Provides both async and sync methods for flexibility.
- Includes a fallback to character-based estimation if tiktoken fails.
Updated countTokens() in VsCodeLmHandler to use the new TokenCounter utility:
- Replaced the broken VSCode LM API call with estimateTokens() from TokenCounter.
- Maintains the existing 4:1 character-to-token ratio for Claude models.
- Uses tiktoken for all other models, with a fallback to a 3:1 ratio.
Added proper error handling and resource cleanup for the token counter.
Updated package.json to include the tiktoken dependency.

This ensures accurate token usage reporting and prevents issues caused by the broken countTokens() method in the VSCode LM API.

Test Procedure

Testing approach:

Verified token counting accuracy with various text lengths (short, medium, long).
Tested with different VSCode LM models (e.g., GPT-3.5, GPT-4, Claude) to ensure compatibility.
Confirmed that Claude models continue to use the 4:1 character-to-token ratio.
Verified that the fallback mechanism works correctly when tiktoken fails.
Ensured that the changes do not break existing functionality for input/output token calculation.

What could break:
Token usage reporting might show different (but more accurate) values compared to before, which could affect cost calculations. However, this is an improvement since the previous values were incorrect.

Confidence:
High - The implementation uses a proven library (tiktoken) and includes a robust fallback mechanism. The changes have been thoroughly tested across different scenarios.

Type of Change

🐛 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
♻️ Refactor Changes
💅 Cosmetic Changes
📚 Documentation update
🏃 Workflow Changes

Pre-flight Checklist

Changes are limited to a single feature, bugfix, or chore (split larger changes into separate PRs).
Tests are passing (npm test) and code is formatted and linted (npm run format && npm run lint).
I have created a changeset using npm run changeset (required for user-facing changes).
I have reviewed contributor guidelines.

Screenshots

Screenshot showing the code changes implementing the tiktoken-based token counter in the VSCode LM provider.

Additional Notes

This fix addresses a fundamental issue with the VSCode Language Model API's countTokens() method, which returns unreliable results. By replacing it with a tiktoken-based solution, we ensure accurate token counting across all model types, improving the reliability of token usage tracking and context management in the VSCode LM provider.

Important

Replaces unreliable countTokens() in VsCodeLmHandler with a heuristic-based method for improved token counting accuracy.

Behavior:
- Replaces countTokens() in VsCodeLmHandler with a heuristic-based token counting method.
- Uses a 4:1 character-to-token ratio for all models, replacing the previous unreliable method.
Implementation:
- Removes dependency on VSCode LM API's countTokens() method.
- Simplifies token counting logic by using a character-based heuristic.
Misc:
- Improves error handling and resource cleanup in VsCodeLmHandler.

^{This description was created by}^{for 28e98b5. You can customize this summary. It will automatically update as commits are pushed.}

changeset-bot · 2025-07-31T23:34:37Z

⚠️ No Changeset found

Latest commit: 28e98b5

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copilot

Pull Request Overview

This PR replaces the VS Code Language Model API's countTokens method with an approximate character-to-token ratio calculation. The change addresses issues with unreliable token counts from the native API that could lead to model hallucinations.

Removes complex error handling and API calls for token counting
Implements uniform 3:1 character-to-token ratio for all non-Claude models
Simplifies the token counting logic significantly

src/api/providers/vscode-lm.ts

github-actions · 2025-07-31T23:43:06Z

Coverage Report

Extension Coverage

Base branch: 47%

PR branch: 48%

✅ Coverage increased or remained the same

Webview Coverage

Base branch: 17%

PR branch: 17%

✅ Coverage increased or remained the same

Overall Assessment

✅ Test coverage has been maintained or improved

_{Last updated: 2025-08-04T06:40:36.112715}

abeatrix

Left some comments in line. Main concern is that we should use tiktoken lite for our use case and that we are not freeing memory after reusing the same encoder that could lead to memory issues

src/utils/tokenCounter.ts

arafatkatze · 2025-08-04T06:35:21Z

@abeatrix

We’re switching to a zero-dependency chars/4 heuristic for VSCode LM token estimation.
We are intentionally NOT bundling tokenizer rank files or pulling in js-tiktoken (or tiktoken) anymore.
This is a pragmatic trade-off: small install size and no memory lifecycle issues over accuracy. The VSCode LM provider has very low usage, so precision here does not justify the cost.

Why not js-tiktoken + ranks

Unpacked install size: ~22.4 MB (npm dist.unpackedSize).
Rank file we’d actually need (o200k_base): ~2.2 MB in dist (bundlers include this when imported).
Package folder on disk: ~21 MB (node_modules footprint).
Compressed download is smaller (~2–7 MB), but every user would still download and store ~22 MB locally for a niche provider.
Memory: encoder lifecycle can be tricky; we’d have to manage creation/free correctly to avoid growth during long sessions.

Why not “tiktoken-lite” (3rd-party fork)

Unmaintained/old, limited model coverage (targeted at older OpenAI models).
Does not match modern o200k-base style tokenizers used by current providers.
Risk of incorrect counts and regressions.

What we shipped instead

countTokens(text): Math.ceil((text || "").length / 4)
No tokenizer packages, no rank files, no WASM, no encoder lifecycle, no webview impact.
We explicitly ignore images and tool/function payloads here; they’re not supported for VSCode LM usage in our flow and we’re fine with an underestimate for this provider.
Clear code comments note this is an intentional heuristic and where we’d reintroduce a “real” tokenizer behind a feature flag if requirements change.

Accuracy vs. cost

The previous approach attempted to be accurate but added complexity, memory risks, and megabytes to every install.
This provider is rarely used. The ROI of shipping megabytes and maintaining tokenizer logic is negative.
If we ever need precision (e.g., customer reports frequent truncation issues specifically with VSCode LM), we’ll add a gated, backend-only tokenizer path or CDN-fetched ranks behind a user setting.

Bottom line

Keep extension lean; avoid shipping multi-MB tokenizers for a low-usage path.
Use a documented, simple heuristic with known limitations.
Revisit only if we see real demand for precise counting on this provider.

Co-authored-by: Daniel Riccio <ricciodaniel98@gmail.com>

Copilot AI review requested due to automatic review settings July 31, 2025 23:34

arafatkatze requested review from saoudrizwan, ocasta181, NightTrek, pashpashpash, dcbartlett, saito-sv and Garoth as code owners July 31, 2025 23:34

Copilot AI reviewed Jul 31, 2025

View reviewed changes

src/api/providers/vscode-lm.ts Outdated Show resolved Hide resolved

src/api/providers/vscode-lm.ts Outdated Show resolved Hide resolved

ellipsis-dev bot reviewed Jul 31, 2025

View reviewed changes

src/api/providers/vscode-lm.ts Outdated Show resolved Hide resolved

arafatkatze commented Jul 31, 2025

View reviewed changes

src/api/providers/vscode-lm.ts Outdated Show resolved Hide resolved

abeatrix requested changes Aug 1, 2025

View reviewed changes

arafatkatze force-pushed the arafatkatze/fix-vscode-lm-tokens branch 3 times, most recently from 4fe98b1 to 82743d5 Compare August 4, 2025 06:30

Change Vscode LM token counts to use approx counting method

28e98b5

arafatkatze force-pushed the arafatkatze/fix-vscode-lm-tokens branch from 82743d5 to 28e98b5 Compare August 4, 2025 06:33

arafatkatze requested a review from abeatrix August 4, 2025 06:35

abeatrix approved these changes Aug 4, 2025

View reviewed changes

arafatkatze merged commit 2cfce57 into main Aug 4, 2025
12 of 13 checks passed

arafatkatze mentioned this pull request Aug 4, 2025

VSCode Language Model API provider dramatically under-reports token usage #4584

Closed

dtrugman pushed a commit to dtrugman/cline that referenced this pull request Aug 24, 2025

feat: add translation orchestration to PR Fixer mode (cline#5280)

adb3ee9

Co-authored-by: Daniel Riccio <ricciodaniel98@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Change Vscode LM token counts to use approx counting method #5280

fix: Change Vscode LM token counts to use approx counting method #5280

Uh oh!

arafatkatze commented Jul 31, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

changeset-bot bot commented Jul 31, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 31, 2025 •

edited

Loading

Uh oh!

abeatrix left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arafatkatze commented Aug 4, 2025

Uh oh!

Uh oh!

Uh oh!

fix: Change Vscode LM token counts to use approx counting method #5280

fix: Change Vscode LM token counts to use approx counting method #5280

Uh oh!

Conversation

arafatkatze commented Jul 31, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Procedure

Type of Change

Pre-flight Checklist

Screenshots

Additional Notes

Uh oh!

changeset-bot bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Extension Coverage

Webview Coverage

Overall Assessment

Uh oh!

abeatrix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arafatkatze commented Aug 4, 2025

Uh oh!

Uh oh!

Uh oh!

arafatkatze commented Jul 31, 2025 •

edited by ellipsis-dev bot

Loading

changeset-bot bot commented Jul 31, 2025 •

edited

Loading

github-actions bot commented Jul 31, 2025 •

edited

Loading