feat: centralize LLM communication tracking in chat model #15146

planger · 2025-03-07T10:06:52Z

What it does

Currently, our application has fragmented tracking of the LLM communications:

Chat model doesn't track all LLM messages (e.g. system message) or multiple LLM requests per user response
Chat agents must manually record LLM communications via CommunicationRecordingService
Tool calls are tracked in the chat model in content parts, which are rather UI focused, but tool calls missing in the
CommunicationRecordingService

This fragmentation means we lack a single source of truth for chat session information.

This PR centralizes all LLM communication tracking in the chat model by:

Introducing a new ChatLanguageModelService that encapsulates LLM request tracking
Moving related code from the abstract chat agent to this new service
Feeding the CommunicationRecordingService via a simple listener on the chat model

Complete tracking of all LLM requests, responses, and tool calls in one place
Simplified interface for chat agents to interact with LLMs
Improved data structure to capture tool calls and LLM messages in the History View (with minor UI enhancements)

How to test

Test a few chat requests with tool calls and

Make sure they end up correctly in the chat model
Make sure they show up in the history view

Follow-ups

Currently requests to the Orchestrator are only tracked because the chat model (did before and still) specifies the Orchestrator as the agent id of the request. Thus, the requests also only show up in the history view for the orchestrator.

We need to find a better solution for the delegation as the one we have now, which was very implicit and quite a special case. I suggest to make the delegation a more explicit event but didn't want to put this into this PR for now.

Breaking changes

This PR introduces breaking changes and requires careful review. If yes, the breaking changes section in the changelog has been updated.

Attribution

Contributed on behalf of STMicroelectronics

Review checklist

As an author, I have thoroughly tested my changes and carefully followed the review guidelines

Reminder for reviewers

As a reviewer, I agree to behave in accordance with the review guidelines

eneufeld · 2025-03-07T10:43:24Z

packages/ai-chat/src/common/chat-language-model-service.ts

+
+                try {
+                    for await (const chunk of originalStream) {
+                        if (chunk.tool_calls?.length) {


isn't this duplicating what we have in our stream parser and with building up response messages?
wouldn't it make more sense to hook in the monitoring there?

Yeah, kind of. The existing code is just quite entangled with the response content object merge handling and rather UI oriented, while this here is simply capturing all data. It might still make sense to consolidate this though.

I looked through the code and your PR again.
IMHO with your change we now extend the request object to contain information that we get and parse anyway but we put it in the response. This PR now adds the same information differently formatted into the request, just so that we can add it in the chat-history-entry to the 'fromRequest' method and use it in the history widget.
IMHO we can just extend the fromResponse method so that the response is an array and pass the response contents formatted in what ever way down.
Especially with the changes I proposed here: https://github.com/eclipse-theia/theia/pull/15092/files#diff-ce6934330cf08ddabaac8d16f9d9751081668d862eecdd6c49e6dd9cfe3770e2
I would claim that it makes more sense to enhance the reponse part of the history entry to just accept an array of response contents and then in the history renderer handle them accordingly.

in the simplest case do something like this:

export function fromResponse( agentId: string, request: ChatRequestModel, args: Partial<CommunicationResponseEntryParam> = {} ): CommunicationResponseEntryParam { return { agentId: agentId, sessionId: request.session.id, requestId: request.id, response: request.response.response.content.map(c => `${c.kind}: ${c.asDisplayString?.()}`), ...args, }; }

and in the history card:

{entry.response.map((response, index) => ( <pre key={`response-${index}`}>{response}</pre> ))}

Thank you very much for looking into this PR!

I understand your point and this change indeed adds some duplication (in parsing and storing), which we may want to avoid.

However the main goal of this PR isn't specifically to improve the history recording (this is actually just something that we can now simplify by having a centralized place where we have access and capture the information).

The main goal is that we have one central place that contains all information about the entire chat request/response in one place. This allows e.g. to make more powerful listeners (like the history recorder) or derive tests from a chat model by replaying the chat session against another LLM or by mocking the LLM for testing the agent -- all based in the data of the chat model. For that we need the complete and plain information.

Currently we capture parts of this information but already parse it into a structure that is optimized for the UI and not for post processing, and in this process we loose information that isn't relevant to the UI (e.g. request settings, or multiple LLM requests that were combined into one user-targeted response).

I'm completely open to challenge and reconsider where and how we capture the information, but I'd like to have one central place to capture the entire information for all subsequent use cases.

It'd probably also be good to have @sdirix in the discussion as I talked about this approach with him recently. Thanks!

Don't get me wrong, I like the centralization. And I agree that the current approach is very UI centric.
That is why I pointed to the changes of the Model in the Thinking PR.
We should align the approach and classes. I focused on the history part just because it is the biggest code change line wise in you pr 😅

JonasHelming · 2025-03-13T21:00:55Z

@planger Could you add this to the epic please: #15068

Currently, our application has fragmented tracking of the LLM communications: - Chat model doesn't track all LLM messages (e.g. system message) or multiple LLM requests per user response - Chat agents must manually record LLM communications via `CommunicationRecordingService` - Tool calls are tracked in the chat model in content parts, which are rather UI focused, but tool calls missing in the `CommunicationRecordingService` This fragmentation means we lack a single source of truth for chat session information. This PR centralizes all LLM communication tracking in the chat model by: 1. Introducing a new `ChatLanguageModelService` that encapsulates LLM request tracking 2. Moving related code from the abstract chat agent to this new service 3. Feeding the `CommunicationRecordingService` via a simple listener on the chat model - Complete tracking of all LLM requests, responses, and tool calls in one place - Simplified interface for chat agents to interact with LLMs - Improved data structure to capture tool calls and LLM messages in the History View (with minor UI enhancements) Contributed on behalf of STMicroelectronics

* Move default request settings from individual models to frontend LanguageModelService * Enhance RequestSettingsSchema with model/provider/agent scoping * Add client-side settings for tool calls and thinking messages

planger

Thank you @eneufeld for merging our two approaches! I really like how the settings are handled centrally now. Also I like that there is now less redundancy.

Regarding capturing all data centrally related to LLM requests and its details (available tools, tool calls, settings):
Am I right that you are suggesting to not track this information in the chat model and have the recording service listening to the chat service, but rather make the history recording service the place that holds all data related to a chat request?

I guess I'd be fine with that, even though I kind of liked the simplicity of having the recording service to be just something like a logger, while the chat model is the central place that captures all information, and there is a listener that just transfers updates to the logger (history service).

Currently though, I think the recording calls and the history service would need to do a bit more, like being able to capture multiple LLM request for one user request, tracking the settings, the available tools. I think we have all this information in the chat language model service, but would need to report and capture it accordingly.

Also, I'd love to be able to avoid having the recording spread across the chat agent and the chat language model service.

One approach that could work is to move the parsing to the chat language model service (as it was in my original PR) and have it emit events to which the chat agent would listen in order to translate that into chat response parts. But this would change quite a bit the current response part merging approach.

So all in all, I'm fine with continuing to go into this direction. It is certainly less data duplication and avoids duplicating the response parsing as I had in my original proposal. This is definitely an improvement with the latest commits on this PR.

But the downside is spreading the recording, losing the chat model as the complete, single source of truth containing all possible information surrounding a user chat request, and instead putting the recording service in this role of tracking all information.

@sdirix What do you think?

planger · 2025-03-17T20:33:18Z

packages/ai-chat/src/common/chat-model.ts

@@ -1097,7 +1098,7 @@ class ChatResponseImpl implements ChatResponse {
    }
 }

-class MutableChatResponseModel implements ChatResponseModel {
+export class MutableChatResponseModel implements ChatResponseModel {


Is this still needed to be exposed?

planger · 2025-03-17T20:34:43Z

packages/ai-chat/src/common/chat-service.ts

@@ -90,7 +116,7 @@ export type PinChatAgent = boolean;

 export const ChatService = Symbol('ChatService');
 export interface ChatService {
-    onActiveSessionChanged: Event<ActiveSessionChangedEvent>
+    onSessionEvent: Event<ActiveSessionChangedEvent | SessionCreatedEvent | SessionDeletedEvent>


It probably doesn't hurt, or might be even useful to future listeners, but technically we don't use this anymore in this PR as far as I can see.

planger · 2025-03-17T20:37:25Z

packages/ai-chat/src/common/chat-agents.ts

@@ -317,6 +310,15 @@ export abstract class AbstractChatAgent implements ChatAgent {
     * Subclasses may override this method to perform additional actions or keep the response open for processing further requests.
     */
    protected async onResponseComplete(request: MutableChatRequestModel): Promise<void> {
+        this.recordingService.recordResponse(


This isn't a deal breaker, but I would really have liked to avoid having the chat agent to report anything to the recording service, especially as it now only reports tool calls, but not the requests. This distributes this single concern across multiple places.

planger · 2025-03-17T20:38:33Z

packages/ai-core/src/common/language-model-service.ts

+            agentId: languageModelRequest.agentId,
+            sessionId: languageModelRequest.sessionId,
+            requestId: languageModelRequest.requestId,
+            request: languageModelRequest.messages


I think here we now miss available tools and settings to be complete.

planger · 2025-03-17T20:40:20Z

packages/ai-core/src/common/language-model-service.ts

+        languageModel: LanguageModel,
+        languageModelRequest: UserRequest
+    ): Promise<LanguageModelResponse> {
+        this.recordingService.recordRequest({


How would an agent report multiple LLM requests for a single user request? Would it be expected that this method is called for each LLM request and the recording service would merge them?

eneufeld · 2025-03-17T23:30:15Z

@planger so my main issue with the recording service 'attaching' on the chatmodel is that the recoding service should not be aware of the chatmodel in the first place. All non-chat agents must be able to also be recorded. So if they have to manually report then I don't see why chat-agents are listened to by the record service and why it should have a dependency on the chatmodel. As you can see I also let the two non chat agents (terminal and code completion) also use the language model service so that they only have to log the result.

To better get the picture of where I go with this, look at #15092, I extend the recording service to log messages which are based on the different types of messages there are (text, tool_call, tool_response, thinking) .

The alternative that I see to using the record service is to create a new base model which contains these messages and move the logic to create these messages from the llm response to the language model service. We then need to check whether we keep the current ToolResponseParts at all as they are used by the ChatUI only.

planger · 2025-03-18T07:54:17Z

@planger so my main issue with the recording service 'attaching' on the chatmodel is that the recoding service should not be aware of the chatmodel in the first place. All non-chat agents must be able to also be recorded. So if they have to manually report then I don't see why chat-agents are listened to by the record service and why it should have a dependency on the chatmodel. As you can see I also let the two non chat agents (terminal and code completion) also use the language model service so that they only have to log the result.

Well the recording service didn't have a dependency. Just the listener for the chat model that transferred the changes to the recording service had a dependency.

I really like the language service, as it is in this PR, that it can handle also non-chat agents gracefully. That's great! It might be even more convenient if we'd let it fill out defaults for this use case, like assigning new IDs if there is no session or request id specified.

The only thing I find problematic is that all clients of the language model service have to worry themselves about recording responses and tool function calls manually, while not needing to worry about recording the request. I feel, this is a bit asymmetric and unexpected, imho.

What speaks against re-introducing the monitoring stream (or just attaching another listener if it is non-streaming) into the language model service to let it track the response and tool calls? Is the duplication of the parsing such a big deal in your view?

sdirix · 2025-03-20T09:21:01Z

But the downside is spreading the recording, losing the chat model as the complete, single source of truth containing all possible information surrounding a user chat request, and instead putting the recording service in this role of tracking all information.

@sdirix What do you think?

I agree. I think we came up with a good concept for the eventual goal state of Theia AI, by defining a more generalized CommunicationModel (or some other naming) which will hold all the low level data. The ChatModel is then only a specialization, adding all kinds of chat related metadata and functionalities.

Leveraging this, the recording service is no longer needed. Instead the history UI can be build on top of the generalized model.

See #15221 for a writeup of this suggestion. Would also be good to collect some +1 and/or comments on the issue to make sure that everyone is aligned.

sdirix · 2025-03-20T09:23:56Z

The only thing I find problematic is that all clients of the language model service have to worry themselves about recording responses and tool function calls manually, while not needing to worry about recording the request. I feel, this is a bit asymmetric and unexpected, imho.

What speaks against re-introducing the monitoring stream (or just attaching another listener if it is non-streaming) into the language model service to let it track the response and tool calls? Is the duplication of the parsing such a big deal in your view?

I agree, this is rather unfortunate. Until we have #15221 in place we should unify the approach. So either everyone records everything on their own, or all recordings are automated. I would lean to the latter, however I'm fine either way.

sdirix · 2025-03-26T13:00:11Z

The content of this PR is taken over by #15092. Therefore I'll close this PR

github-project-automation bot added this to PR Backlog Mar 7, 2025

github-project-automation bot moved this to Waiting on reviewers in PR Backlog Mar 7, 2025

planger force-pushed the planger/improve-chat-recording branch from e49e96d to 7ddbe87 Compare March 7, 2025 10:07

planger mentioned this pull request Mar 7, 2025

[Theia AI] Add LLMSettings to UI #15100

Closed

planger requested a review from eneufeld March 7, 2025 10:08

eneufeld reviewed Mar 7, 2025

View reviewed changes

planger mentioned this pull request Mar 13, 2025

Theia AI - Epic - next #15068

Closed

61 tasks

planger and others added 4 commits March 14, 2025 22:13

Add LLM request to terminal agent recording

690b7bb

Add LLM requests to code completion agent

47e87ad

let all request go through LanguageModelService, mode it to ai-core

ed14861

eneufeld force-pushed the planger/improve-chat-recording branch from fdececd to ed14861 Compare March 14, 2025 21:13

centralize request settings handling

ac4ff54

* Move default request settings from individual models to frontend LanguageModelService * Enhance RequestSettingsSchema with model/provider/agent scoping * Add client-side settings for tool calls and thinking messages

eneufeld force-pushed the planger/improve-chat-recording branch from b81c09c to ac4ff54 Compare March 15, 2025 00:50

planger commented Mar 17, 2025

View reviewed changes

JonasHelming mentioned this pull request Mar 18, 2025

Endgame Plan for release 1.60 #15216

Closed

25 tasks

sdirix closed this Mar 26, 2025

sdirix deleted the planger/improve-chat-recording branch March 26, 2025 13:00

JonasHelming mentioned this pull request Apr 5, 2025

Theia AI Epic - 1.60 #15405

Closed

37 tasks

feat: centralize LLM communication tracking in chat model #15146

feat: centralize LLM communication tracking in chat model #15146

Uh oh!

Conversation

planger commented Mar 7, 2025

What it does

How to test

Follow-ups

Breaking changes

Attribution

Review checklist

Reminder for reviewers

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planger Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JonasHelming commented Mar 13, 2025

Uh oh!

planger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eneufeld commented Mar 17, 2025

Uh oh!

planger commented Mar 18, 2025

Uh oh!

sdirix commented Mar 20, 2025

Uh oh!

sdirix commented Mar 20, 2025

Uh oh!

sdirix commented Mar 26, 2025

Uh oh!

Uh oh!

planger Mar 7, 2025 •

edited

Loading