Skip to content

Conversation

arafatkatze
Copy link
Contributor

@arafatkatze arafatkatze commented Jul 31, 2025

Solves GitHub issue #4584, reflecting the tiktoken-based solution implemented in the VSCode LM provider:


Description

Problem:
The VSCode Language Model API's countTokens() method is unreliable and often returns incorrect token counts for non-Claude models. For example, it frequently returns a constant value (like 4) regardless of the actual text length. This leads to inaccurate token usage reporting, potential context management issues, and incorrect cost calculations.

Solution:
This PR replaces the unreliable countTokens() method in the VSCode LM provider with a robust tiktoken-based token counter. The new implementation uses the tiktoken library's cl100k_base encoding to provide accurate token estimates across all model types. If tiktoken fails, it gracefully falls back to a character-based estimation (4:1 ratio for Claude models and 3:1 for others). This ensures consistent and reliable token counting behavior.

Changes:

  • Added a new TokenCounter utility in src/utils/tokenCounter.ts:
    • Uses tiktoken's cl100k_base encoding for accurate token counting.
    • Provides both async and sync methods for flexibility.
    • Includes a fallback to character-based estimation if tiktoken fails.
  • Updated countTokens() in VsCodeLmHandler to use the new TokenCounter utility:
    • Replaced the broken VSCode LM API call with estimateTokens() from TokenCounter.
    • Maintains the existing 4:1 character-to-token ratio for Claude models.
    • Uses tiktoken for all other models, with a fallback to a 3:1 ratio.
  • Added proper error handling and resource cleanup for the token counter.
  • Updated package.json to include the tiktoken dependency.

This ensures accurate token usage reporting and prevents issues caused by the broken countTokens() method in the VSCode LM API.


Test Procedure

Testing approach:

  • Verified token counting accuracy with various text lengths (short, medium, long).
  • Tested with different VSCode LM models (e.g., GPT-3.5, GPT-4, Claude) to ensure compatibility.
  • Confirmed that Claude models continue to use the 4:1 character-to-token ratio.
  • Verified that the fallback mechanism works correctly when tiktoken fails.
  • Ensured that the changes do not break existing functionality for input/output token calculation.

What could break:
Token usage reporting might show different (but more accurate) values compared to before, which could affect cost calculations. However, this is an improvement since the previous values were incorrect.

Confidence:
High - The implementation uses a proven library (tiktoken) and includes a robust fallback mechanism. The changes have been thoroughly tested across different scenarios.


Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • ♻️ Refactor Changes
  • 💅 Cosmetic Changes
  • 📚 Documentation update
  • 🏃 Workflow Changes

Pre-flight Checklist

  • Changes are limited to a single feature, bugfix, or chore (split larger changes into separate PRs).
  • Tests are passing (npm test) and code is formatted and linted (npm run format && npm run lint).
  • I have created a changeset using npm run changeset (required for user-facing changes).
  • I have reviewed contributor guidelines.

Screenshots

Screenshot showing the code changes implementing the tiktoken-based token counter in the VSCode LM provider.


Additional Notes

This fix addresses a fundamental issue with the VSCode Language Model API's countTokens() method, which returns unreliable results. By replacing it with a tiktoken-based solution, we ensure accurate token counting across all model types, improving the reliability of token usage tracking and context management in the VSCode LM provider.


Important

Replaces unreliable countTokens() in VsCodeLmHandler with a heuristic-based method for improved token counting accuracy.

  • Behavior:
    • Replaces countTokens() in VsCodeLmHandler with a heuristic-based token counting method.
    • Uses a 4:1 character-to-token ratio for all models, replacing the previous unreliable method.
  • Implementation:
    • Removes dependency on VSCode LM API's countTokens() method.
    • Simplifies token counting logic by using a character-based heuristic.
  • Misc:
    • Improves error handling and resource cleanup in VsCodeLmHandler.

This description was created by Ellipsis for 28e98b5. You can customize this summary. It will automatically update as commits are pushed.

@Copilot Copilot AI review requested due to automatic review settings July 31, 2025 23:34
Copy link

changeset-bot bot commented Jul 31, 2025

⚠️ No Changeset found

Latest commit: 28e98b5

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the VS Code Language Model API's countTokens method with an approximate character-to-token ratio calculation. The change addresses issues with unreliable token counts from the native API that could lead to model hallucinations.

  • Removes complex error handling and API calls for token counting
  • Implements uniform 3:1 character-to-token ratio for all non-Claude models
  • Simplifies the token counting logic significantly

Copy link
Contributor

github-actions bot commented Jul 31, 2025

Coverage Report

Extension Coverage

Base branch: 47%

PR branch: 48%

✅ Coverage increased or remained the same

Webview Coverage

Base branch: 17%

PR branch: 17%

✅ Coverage increased or remained the same

Overall Assessment

Test coverage has been maintained or improved

Last updated: 2025-08-04T06:40:36.112715

Copy link
Contributor

@abeatrix abeatrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments in line. Main concern is that we should use tiktoken lite for our use case and that we are not freeing memory after reusing the same encoder that could lead to memory issues

@arafatkatze arafatkatze force-pushed the arafatkatze/fix-vscode-lm-tokens branch 3 times, most recently from 4fe98b1 to 82743d5 Compare August 4, 2025 06:30
@arafatkatze arafatkatze force-pushed the arafatkatze/fix-vscode-lm-tokens branch from 82743d5 to 28e98b5 Compare August 4, 2025 06:33
@arafatkatze
Copy link
Contributor Author

@abeatrix

  • We’re switching to a zero-dependency chars/4 heuristic for VSCode LM token estimation.
  • We are intentionally NOT bundling tokenizer rank files or pulling in js-tiktoken (or tiktoken) anymore.
  • This is a pragmatic trade-off: small install size and no memory lifecycle issues over accuracy. The VSCode LM provider has very low usage, so precision here does not justify the cost.

Why not js-tiktoken + ranks

  • Unpacked install size: ~22.4 MB (npm dist.unpackedSize).
  • Rank file we’d actually need (o200k_base): ~2.2 MB in dist (bundlers include this when imported).
  • Package folder on disk: ~21 MB (node_modules footprint).
  • Compressed download is smaller (~2–7 MB), but every user would still download and store ~22 MB locally for a niche provider.
  • Memory: encoder lifecycle can be tricky; we’d have to manage creation/free correctly to avoid growth during long sessions.

Why not “tiktoken-lite” (3rd-party fork)

  • Unmaintained/old, limited model coverage (targeted at older OpenAI models).
  • Does not match modern o200k-base style tokenizers used by current providers.
  • Risk of incorrect counts and regressions.

What we shipped instead

  • countTokens(text): Math.ceil((text || "").length / 4)
  • No tokenizer packages, no rank files, no WASM, no encoder lifecycle, no webview impact.
  • We explicitly ignore images and tool/function payloads here; they’re not supported for VSCode LM usage in our flow and we’re fine with an underestimate for this provider.
  • Clear code comments note this is an intentional heuristic and where we’d reintroduce a “real” tokenizer behind a feature flag if requirements change.

Accuracy vs. cost

  • The previous approach attempted to be accurate but added complexity, memory risks, and megabytes to every install.
  • This provider is rarely used. The ROI of shipping megabytes and maintaining tokenizer logic is negative.
  • If we ever need precision (e.g., customer reports frequent truncation issues specifically with VSCode LM), we’ll add a gated, backend-only tokenizer path or CDN-fetched ranks behind a user setting.

Bottom line

  • Keep extension lean; avoid shipping multi-MB tokenizers for a low-usage path.
  • Use a documented, simple heuristic with known limitations.
  • Revisit only if we see real demand for precise counting on this provider.

@arafatkatze arafatkatze requested a review from abeatrix August 4, 2025 06:35
@arafatkatze arafatkatze merged commit 2cfce57 into main Aug 4, 2025
12 of 13 checks passed
dtrugman pushed a commit to dtrugman/cline that referenced this pull request Aug 24, 2025
Co-authored-by: Daniel Riccio <ricciodaniel98@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants