Skip to content

Transfer aggregation of streaming events off the Model class #1449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Jun 20, 2025

Conversation

aymeric-roucher
Copy link
Collaborator

@aymeric-roucher aymeric-roucher commented Jun 17, 2025

This PR change the logic of streaming messages:

  • Previously, the logic was to accumulate streaming deltas in the Model classes, and yield objects that "contain all the generated text since start of streaming until now"
  • This PR makes the ModelClass directly return atomic streaming deltas, to handle the aggregation only within the agent.
    The reason for this is that front-ends like copilotkit generally expect individual streaming deltas.

Additionally, it removes the HfApiModel class, which was deprecated and due for deletion in 1.17, and fuses Message into ChatMessage

"LiteLLMModel",
"LiteLLMRouterModel",
"OpenAIServerModel",
"OpenAIModel",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a copy of the class with "Server" removed in the name for easier access

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@@ -17,11 +17,6 @@
logger = getLogger(__name__)


class Message(TypedDict):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albertvillanova since Message and ChatMessage were mostly interchangeable, I fuse them.
One potential difficulty to consider is that the class is not a TypedDict anymore, so cannot be considered a dict.
But this didn't really create any implementation problem so far: we just handle ChatMessage objects internally, and can handle the dict conversion in Model subclasses just before sending messages to inference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK... I though the purpose of to_messages was to convert the steps into a format directly consumable by the model (so plain dicts instead of instance objects).

@aymeric-roucher aymeric-roucher force-pushed the richer-streaming-events branch from 2f33e4c to 8c8ebaa Compare June 19, 2025 16:16
@aymeric-roucher aymeric-roucher marked this pull request as ready for review June 19, 2025 16:30
Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good refactoring!

@@ -66,7 +66,7 @@ print(model([{"role": "user", "content": [{"type": "text", "text": "Ok!"}]}], st

### InferenceClientModel

The `HfApiModel` wraps huggingface_hub's [InferenceClient](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference) for the execution of the LLM. It supports all [Inference Providers](https://huggingface.co/docs/inference-providers/index) available on the Hub: Cerebras, Cohere, Fal, Fireworks, HF-Inference, Hyperbolic, Nebius, Novita, Replicate, SambaNova, Together, and more.
The `InferenceClientModel` wraps huggingface_hub's [InferenceClient](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference) for the execution of the LLM. It supports all [Inference Providers](https://huggingface.co/docs/inference-providers/index) available on the Hub: Cerebras, Cohere, Fal, Fireworks, HF-Inference, Hyperbolic, Nebius, Novita, Replicate, SambaNova, Together, and more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I guess this was missed in:

Comment on lines -1422 to -1428
class HfApiModel(InferenceClientModel):
def __new__(cls, *args, **kwargs):
warnings.warn(
"HfApiModel was renamed to InferenceClientModel in version 1.14.0 and will be removed in 1.17.0.",
FutureWarning,
)
return super().__new__(cls)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this was in my TODO list for the next release.

I kept it in the last release because, although the class was renamed in v1.14.0 via

I only added the deprecation warning later in v1.16.0 via

So I wanted to allow a bit more time for users to update their code before removing it entirely.

Comment on lines 1619 to 1621
class OpenAIModel(OpenAIServerModel):
def __new__(cls, *args, **kwargs):
return super().__new__(cls)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just an alias or are you planning to deprecate OpenAIServerModel?

If this is just an alias and both are identically valid, then I would suggest:

OpenAIModel = OpenAIServerModel

If you are planning to deprecate OpenAIServerModel, then you should inherit inversely:

class OpenAIServerModel(OpenAIModel):
    def __new__(cls, *args, **kwargs):
        warnings.warn(
            "OpenAIServerModel was renamed to OpenAIModel in version 1.19.0 and will be removed in 1.22.0. "
            "Please use OpenAIModel instead.",
            FutureWarning,
            stacklevel=2
        )
        return super().__new__(cls)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an alias : so I'll just copy it !

@@ -17,11 +17,6 @@
logger = getLogger(__name__)


class Message(TypedDict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK... I though the purpose of to_messages was to convert the steps into a format directly consumable by the model (so plain dicts instead of instance objects).

@aymeric-roucher
Copy link
Collaborator Author

Thank you for your comments! So if we think again of the distinction ChatMessage, and Message, Message is just a dict-like version of ChatMessage, it's a bit like a less complete and dict-converted version of ChatMessage, thus the fusion of the two.
to_messages is a way to convert memory steps to chat messages, it's not particularly expected for these messages to already be dictionaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'NoneType' object has no attribute 'input_tokens'
3 participants