Skip to content

VSCode Language Model API provider dramatically under-reports token usage #4584

@johnib

Description

@johnib

What happened?

The VSCode Language Model API provider in Cline shows token counts that totally wrong than actual usage. For example, when actual usage is ~70K tokens (verified via network monitoring), Cline's context window shows only ~17K tokens. This causes several critical issues:

  1. Context window management fails because reported usage is too low
  2. Conversation truncation doesn't trigger when it should

Expected behavior: Token counts should accurately reflect actual API usage to enable proper context management and cost tracking.

Steps to reproduce

  1. Use any VSCode Language Model API provider in Cline (e.g., GitHub Copilot)
  2. Have a conversation that generates substantial token usage
  3. Compare the token count shown in Cline's context window with actual network usage (the token usage is returned in the GHCP API response)
  4. Observe that Cline consistently under-reports token usage by approximately 4x

This issue occurs consistently with all VSCode LM API usage.

Relevant API REQUEST output

Debug logs reveal the core issue in the `calculateTotalInputTokens()` method:

- `countTokens(systemPrompt)` as string: 10,682 tokens  
- `countTokens(vsCodeLmMessages[0])` as LanguageModelChatMessage: 4 tokens

The same content shows dramatically different token counts depending on whether it's passed as a string or LanguageModelChatMessage object.

Provider/Model

vscode-lm / claude-sonnet-4

Operating System

darwin 24.5.0

System Info

VSCode: 1.101.2, Node.js: v22.15.1, Architecture: arm64

Cline Version

3.18.0

Additional context

Root Causes Identified:

  1. Double-counting bug: The system prompt is counted twice - once as a string and once as the first message in the vsCodeLmMessages array

  2. API behavior discrepancy: VSCode's countTokens() method behaves completely differently when given:

    • A raw string (counts full content correctly)
    • A LanguageModelChatMessage object (counts minimal metadata only, not actual content)

Technical Location:
File: src/api/providers/vscode-lm.ts
Method: calculateTotalInputTokens()

The method calls countTokens() on LanguageModelChatMessage objects instead of extracting their text content, causing massive under-reporting since VSCode only counts message structure rather than the actual text content.

Affected Components:

  • Context window management
  • Conversation truncation logic
  • Token usage reporting
  • Cost calculation

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions