Skip to content

[Feature Request] Deprecate current token usage calculation #3026

@Wendong-Fan

Description

@Wendong-Fan

Required prerequisites

Motivation

The current token counting implementation using BaseTokenCounter and its subclasses (OpenAITokenCounter,
AnthropicTokenCounter, LiteLLMTokenCounter, MistralTokenCounter) presents several significant challenges:

  1. Accuracy Issues: Manual token counting via tiktoken and other tokenizers is prone to inaccuracies, especially
    with:
    - Different model-specific tokenization rules (GPT-3.5, GPT-4, O1 models each have different tokens_per_message
    and tokens_per_name values)
    - Image token calculations for vision models requiring complex logic
    - Model-specific edge cases and special tokens
  2. Streaming Mode Limitations: Token counting in streaming mode is particularly problematic as:
    - The full response isn't available until streaming completes
    - Manual accumulation of streamed chunks is error-prone
    - OpenAI now supports stream_options: {"include_usage": true} to get accurate usage in the final chunk
  3. Maintenance Burden: Supporting all models requires:
    - Model-specific token counter implementations for each provider
    - Keeping up with changes in tokenization rules
    - Complex logic for different content types (text, images, structured outputs)

Proposed Solution

Deprecate BaseTokenCounter and its implementations in favor of using the native usage data from LLM responses:

  • OpenAI/Compatible APIs: Use response.usage which provides accurate prompt_tokens, completion_tokens, and
    total_tokens
  • Streaming: Leverage stream_options: {"include_usage": true} to get usage data in the final streamed chunk
  • Other providers: Each provider's SDK returns usage information in their response objects

Benefits

  1. Accuracy: Usage data comes directly from the model provider, ensuring 100% accuracy
  2. Simplicity: Eliminates ~500+ lines of complex token counting code
  3. Maintainability: No need to update tokenization logic when providers change their models
  4. Streaming support: Native support for token usage in streaming responses
  5. Universal compatibility: All major LLM providers include usage data in their responses

Migration Path

  1. Update model implementations to extract and return usage data from native responses
  2. Provide a deprecation warning for BaseTokenCounter usage
  3. Update documentation and examples to use the new approach
  4. Remove BaseTokenCounter and related code in a future major version

Code References

  • Token counting implementation: camel/utils/token_counting.py:77-544
  • Usage data already captured in some models: camel/models/litellm_model.py:217
  • Streaming with usage example: examples/agents/chatagent_stream.py:44

Solution

No response

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions