-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
CAMEL 2.0P0Task with high level priorityTask with high level prioritycall for contributionenhancementNew feature or requestNew feature or request
Milestone
Description
Required prerequisites
- I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- Consider asking first in a Discussion.
Motivation
The current token counting implementation using BaseTokenCounter and its subclasses (OpenAITokenCounter,
AnthropicTokenCounter, LiteLLMTokenCounter, MistralTokenCounter) presents several significant challenges:
- Accuracy Issues: Manual token counting via tiktoken and other tokenizers is prone to inaccuracies, especially
with:
- Different model-specific tokenization rules (GPT-3.5, GPT-4, O1 models each have different tokens_per_message
and tokens_per_name values)
- Image token calculations for vision models requiring complex logic
- Model-specific edge cases and special tokens - Streaming Mode Limitations: Token counting in streaming mode is particularly problematic as:
- The full response isn't available until streaming completes
- Manual accumulation of streamed chunks is error-prone
- OpenAI now supports stream_options: {"include_usage": true} to get accurate usage in the final chunk - Maintenance Burden: Supporting all models requires:
- Model-specific token counter implementations for each provider
- Keeping up with changes in tokenization rules
- Complex logic for different content types (text, images, structured outputs)
Proposed Solution
Deprecate BaseTokenCounter and its implementations in favor of using the native usage data from LLM responses:
- OpenAI/Compatible APIs: Use response.usage which provides accurate prompt_tokens, completion_tokens, and
total_tokens - Streaming: Leverage stream_options: {"include_usage": true} to get usage data in the final streamed chunk
- Other providers: Each provider's SDK returns usage information in their response objects
Benefits
- Accuracy: Usage data comes directly from the model provider, ensuring 100% accuracy
- Simplicity: Eliminates ~500+ lines of complex token counting code
- Maintainability: No need to update tokenization logic when providers change their models
- Streaming support: Native support for token usage in streaming responses
- Universal compatibility: All major LLM providers include usage data in their responses
Migration Path
- Update model implementations to extract and return usage data from native responses
- Provide a deprecation warning for BaseTokenCounter usage
- Update documentation and examples to use the new approach
- Remove BaseTokenCounter and related code in a future major version
Code References
- Token counting implementation: camel/utils/token_counting.py:77-544
- Usage data already captured in some models: camel/models/litellm_model.py:217
- Streaming with usage example: examples/agents/chatagent_stream.py:44
Solution
No response
Alternatives
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
CAMEL 2.0P0Task with high level priorityTask with high level prioritycall for contributionenhancementNew feature or requestNew feature or request