VSCode Language Model API provider dramatically under-reports token usage

### What happened?

The VSCode Language Model API provider in Cline shows token counts that totally wrong than actual usage. For example, when actual usage is ~70K tokens (verified via network monitoring), Cline's context window shows only ~17K tokens. This causes several critical issues:

1. Context window management fails because reported usage is too low
2. Conversation truncation doesn't trigger when it should  

Expected behavior: Token counts should accurately reflect actual API usage to enable proper context management and cost tracking.

### Steps to reproduce

1. Use any VSCode Language Model API provider in Cline (e.g., GitHub Copilot)
2. Have a conversation that generates substantial token usage
3. Compare the token count shown in Cline's context window with actual network usage (the token usage is returned in the GHCP API response)
4. Observe that Cline consistently under-reports token usage by approximately 4x

This issue occurs consistently with all VSCode LM API usage.

### Relevant API REQUEST output

```shell
Debug logs reveal the core issue in the `calculateTotalInputTokens()` method:

- `countTokens(systemPrompt)` as string: 10,682 tokens  
- `countTokens(vsCodeLmMessages[0])` as LanguageModelChatMessage: 4 tokens

The same content shows dramatically different token counts depending on whether it's passed as a string or LanguageModelChatMessage object.
```

### Provider/Model

vscode-lm / claude-sonnet-4

### Operating System

darwin 24.5.0

### System Info

VSCode: 1.101.2, Node.js: v22.15.1, Architecture: arm64

### Cline Version

3.18.0

### Additional context

**Root Causes Identified:**

1. **Double-counting bug**: The system prompt is counted twice - once as a string and once as the first message in the vsCodeLmMessages array

2. **API behavior discrepancy**: VSCode's `countTokens()` method behaves completely differently when given:
   - A raw string (counts full content correctly)
   - A `LanguageModelChatMessage` object (counts minimal metadata only, not actual content)

**Technical Location:** 
File: `src/api/providers/vscode-lm.ts`
Method: `calculateTotalInputTokens()`

The method calls `countTokens()` on LanguageModelChatMessage objects instead of extracting their text content, causing massive under-reporting since VSCode only counts message structure rather than the actual text content.

**Affected Components:**
- Context window management
- Conversation truncation logic  
- Token usage reporting
- Cost calculation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VSCode Language Model API provider dramatically under-reports token usage #4584

What happened?

Steps to reproduce

Relevant API REQUEST output

Provider/Model

Operating System

System Info

Cline Version

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VSCode Language Model API provider dramatically under-reports token usage #4584

Description

What happened?

Steps to reproduce

Relevant API REQUEST output

Provider/Model

Operating System

System Info

Cline Version

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions