-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Description
What happened?
The VSCode Language Model API provider in Cline shows token counts that totally wrong than actual usage. For example, when actual usage is ~70K tokens (verified via network monitoring), Cline's context window shows only ~17K tokens. This causes several critical issues:
- Context window management fails because reported usage is too low
- Conversation truncation doesn't trigger when it should
Expected behavior: Token counts should accurately reflect actual API usage to enable proper context management and cost tracking.
Steps to reproduce
- Use any VSCode Language Model API provider in Cline (e.g., GitHub Copilot)
- Have a conversation that generates substantial token usage
- Compare the token count shown in Cline's context window with actual network usage (the token usage is returned in the GHCP API response)
- Observe that Cline consistently under-reports token usage by approximately 4x
This issue occurs consistently with all VSCode LM API usage.
Relevant API REQUEST output
Debug logs reveal the core issue in the `calculateTotalInputTokens()` method:
- `countTokens(systemPrompt)` as string: 10,682 tokens
- `countTokens(vsCodeLmMessages[0])` as LanguageModelChatMessage: 4 tokens
The same content shows dramatically different token counts depending on whether it's passed as a string or LanguageModelChatMessage object.
Provider/Model
vscode-lm / claude-sonnet-4
Operating System
darwin 24.5.0
System Info
VSCode: 1.101.2, Node.js: v22.15.1, Architecture: arm64
Cline Version
3.18.0
Additional context
Root Causes Identified:
-
Double-counting bug: The system prompt is counted twice - once as a string and once as the first message in the vsCodeLmMessages array
-
API behavior discrepancy: VSCode's
countTokens()
method behaves completely differently when given:- A raw string (counts full content correctly)
- A
LanguageModelChatMessage
object (counts minimal metadata only, not actual content)
Technical Location:
File: src/api/providers/vscode-lm.ts
Method: calculateTotalInputTokens()
The method calls countTokens()
on LanguageModelChatMessage objects instead of extracting their text content, causing massive under-reporting since VSCode only counts message structure rather than the actual text content.
Affected Components:
- Context window management
- Conversation truncation logic
- Token usage reporting
- Cost calculation