Skip to content

Conversation

Xiuyu-Li
Copy link
Contributor

@Xiuyu-Li Xiuyu-Li commented Feb 20, 2025

Motivation

#2876 enforces a strict check that disallows cases where the sum of the number of input tokens and max_new_tokens exceeds the context length. However, as @merrymercy pointed out and several reports ([1] [2]) indicate, this check may be too restrictive for users who want to set a very large fixed max_new_tokens value when the input length is not easy to determine or not important.

Modifications

This PR introduces an optional auto-truncation feature by integrating the existing allow_auto_truncate option from server_args into TokenizerManager.

Checklist

@Xiuyu-Li Xiuyu-Li requested a review from Ying1123 February 25, 2025 23:03
@Xiuyu-Li
Copy link
Contributor Author

Xiuyu-Li commented May 1, 2025

@Ying1123 Could you take a look at this PR and let me know if there’s anything else I need to change to move it forward? It would be great to get it merged for use cases where the input lengths are unknown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants