Skip to content

Conversation

npepinpe
Copy link
Member

Description

This PR updates the error messages sent as rejection responses back to user when we face a processing error (i.e. an unexpected error).

Previously, there was a good chance that we would leak the complete record as part of it, which is generally something users do not understand and do not know what to do with it.

Note that the original error is still logged, so we have access to it via the logs, but the response is much more lightweight. The "actionable" part of the error is to look at logs or ask your operator to do so, with the command key, partition, value type, and intent.

Related issues

related to #35177

In the sense that this highlighted we just leak the complete record all the way through the client.

@npepinpe npepinpe self-assigned this Aug 23, 2025
@npepinpe npepinpe requested a review from a team as a code owner August 23, 2025 15:43
@npepinpe npepinpe requested review from panagiotisgts and removed request for a team August 23, 2025 15:43
@github-actions github-actions bot added the component/zeebe Related to the Zeebe component/team label Aug 23, 2025
@ThorbenLindhauer
Copy link
Member

I think it would be good if for errors that are raised due to lack of permisions, that it's somehow understandable from the error, as these are expected cases (Something like a "Sorry, you don't have permission to do this"). I'd also argue that just the HTTP status code isn't sufficient in these cases since it's also used for other things.

@npepinpe
Copy link
Member Author

npepinpe commented Aug 25, 2025

I think we're talking about different cases here. I updated the cases where we're handling unexpected errors. I hope that the case you described is actually handled as expected, and we don't randomly throw exceptions.

The fix here is that we still have (conceptually) a big try catch around engine execution, in case there are any unexpected errors. And before, we were passing back the complete exception message, which may include the complete serialized record.

So two things here:

  1. This may leak details
  2. Passing back the raw exception is unlikely to be helpful for the user. They will not only get a 500 anyway (because an unexpected exception is a bug, most likely), but it's very likely not actionable.

Does that make sense? For missing permissions, I would expect the processors themselves to send an explicit rejection, and not have it handled in the uncaught exception loop.

@ThorbenLindhauer
Copy link
Member

Thanks, yes, that makes sense. And I do recall that for permission problems we already had another format and that didn't expose the full record, so that fits your explanation :)

@npepinpe npepinpe force-pushed the npp-avoid-leaking-record-on-error branch from eaad7b6 to c183606 Compare August 25, 2025 12:01
Base automatically changed from npp-35177-sanitize-auth-data to main August 25, 2025 12:09
@npepinpe npepinpe enabled auto-merge August 26, 2025 07:37
@npepinpe npepinpe added this pull request to the merge queue Aug 26, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 26, 2025
@npepinpe npepinpe added this pull request to the merge queue Aug 26, 2025
Merged via the queue into main with commit f2b1fe9 Aug 26, 2025
171 of 173 checks passed
@npepinpe npepinpe deleted the npp-avoid-leaking-record-on-error branch August 26, 2025 10:09
@backport-action
Copy link
Collaborator

Successfully created backport PR for release-8.8.0-alpha8:

houssain-barouni added a commit that referenced this pull request Aug 28, 2025
…error responses (#37222)

# Description
Backport of #37152 to `release-8.8.0-alpha8`.

relates to #35177
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants