-
Notifications
You must be signed in to change notification settings - Fork 5.1k
access log: new 20x faster json formatter implementation #35545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/wait |
exception... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did one quick pass. The big win will be to not use a protobuf as an intermediate format, but purely use the json streamer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should also say: this is great work! And even without the suggestions I have made it's a huge improvement, though I'm not entirely clear why the old one was so slow as you didn't really change it in this PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few more comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/wait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/wait
Fine, most comments are addressed. There are two problems:
cc @jmarantz |
/wait |
This PR involves performance, substitution format, Json serializing, etc. and this PR has lots of technical details and background context. It's not easy to review this PR. Thanks so much that @jmarantz has provided lots of great suggestions that make the PR better, on both the code quality and performance (now the new Because these are lots of historical comments and responses to the comments. And the PR also changed a lot compared to initial version. So, I cleaned all history context. And lets re-start from here. I will add some comment about:
Hope these could help to future reviewing and other reviewers. |
This comment was marked as duplicate.
This comment was marked as duplicate.
/wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wanted to finish this review off but I feel like we are still not aligned on converging JSON implementations.
struct JsonString { | ||
std::string value_; | ||
}; | ||
struct TmplString { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we pick a better name than TmplString? TemplateString? Add comments to both of these options indicating what they mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still i'ts not clear from the class names what they do. More descriptive names and comments would help.
struct TmplString { | ||
std::string value_; | ||
}; | ||
using FormatElement = absl::variant<JsonString, TmplString>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perf nit: I think absl::variant might be overkill since both options have the same type underneath, but maybe it optimizes well. But you could obviously do this with a string and bool, or string and enum.
Or if you were really after performance you'd make the first character of the string indicate which variant it is, and use the absl::string_view(str).substr(1,0)
to get the rest out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion. Sounds string + bool is better for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you evaluate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for calling it out. There are too much historical context. And I want re-start it, so I close all comment directly without a careful checking. Sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to re-enter comments I previously entered.
* For example given the following proto struct format configuration: | ||
* | ||
* json_format: | ||
* text: "text" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use different words in the example here instead of 'text' and 'text' so the rendering is unambiguous? Say, name: "value"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still looks the same.
It might be good to leave comments unresolved and let the reviewer resolve them?
void formatValueToFormatElements(const ProtobufWkt::Value& value); | ||
void formatValueToFormatElements(const ProtoList& list_value); | ||
|
||
const bool keep_value_type_{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the {}
-- this has to be initialized in the ctor based on the ctor args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still not done
np. We all want a best solution. I will address your comments after we get a conclusion about the exception usage. |
/wait I think a separate PR should be cut to sort out the exception issue with the sanitizer. |
Signed-off-by: wangbaiping <wangbaiping@bytedance.com>
/retest |
Signed-off-by: wangbaiping <wangbaiping@bytedance.com>
/** | ||
* Serializes a integer number. | ||
* NOTE: All numbers in JSON is float. When loading output of this serializer, the parser's | ||
* implementation decides if the full precision of big integer could be preserved or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the ref to the stackoverflow article & rfc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
…pt-json-formatter
Signed-off-by: wangbaiping <wangbaiping@bytedance.com>
…pt-json-formatter
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@wbpcode am I correct that new implementation in 1.32 ignores |
it will ignore the sort_properties, but will obey the omit_empty_value And the new implementation only be enabled by default after 1.32. :) |
I have just tested it in our custom build of 1.32 and I see null-s with this new implementation, while we have |
I will check it this weekend. Maybe there is a problem we didn't notice. |
Thank you! |
@fxposter I think I know why. We do almost every thing at configuration loading time, and only need to do the substitution command evaluation and string joining at runtime. This way bring great performance for the Json formatter. But also make it impossible to eliminate the null entry. Because the key has been serialized at configuration loading time. The only way to resolve it is keep the structural tree of configuration and iterate the tree at runtime. It's no doubt that will penalize performance. |
Is this a block point for you? I can try to find some time to test how much performance degradation that the fix will bring. If the degradation is acceptable, we can try to fix it. |
I don't think it's problematic for us in a way that we won't be able to adjust the log collector code, it just no longer matches the documentation. |
Also, probably not adding unnecessary strings and writing them to file can have benefits by itself. |
It's more complex because the serialization of the keys is done when we loading the configuration, and we can never know if the value will be null or not at that time. I will create a bug first to record this problem. |
@wbpcode Did you create a bug? |
nope. I record it in my todo but forget to create a bug |
@wbpcode Would you mind creating a bug? |
ofcourse not. I will create one when I could access my computer. But would you mind give more context about the parsing problem? The reason why the bug has no priority is because I think it's more like a minor behavior change and won't actually effect the users. |
@wbpcode Hi, thanks for the amazing performance improvement work.
We previously used the text: field with an embedded JSON string for performance reasons. However, with the introduction of the new high-performance JSON formatter, we’re evaluating whether json_format can now give us comparable throughput with improved structure. Given this configuration, are we eligible to benefit from the new logging_with_fast_json_formatter implementation? |
yeah, but please ensure your envoy version has enabled the new formater implementation by default. The new json formater should provide much better performance compared to previous version. |
And it would be super appreciated if there is any feedback about the new formatter. |
Commit Message: access log: new 20x faster json formatter implementation
Additional Description:
The core idea of the new
JsonFormatter
.The core idea of the new
JsonFormatter
is to do as much as possible work when loading configuration. Given a specificjson_format
:When loading configuration, the keys, raw values, delimiters (all parts don't contains the substitution format commands) will be sanitized and serialized as JSON string pieces (raw JSON strings) directly.
The strings that contain substitution commands will be extracted as template string and parsed as substitution formatters (objects that could extract dynamic values from request/response/stream info, etc.).
Finally, the whole proto struct configuration we be parsed to an array of raw JSON piece strings and formatters.
Basically, above configuration finally will be parsed and serialized to following pieces and formatters:
Then, at runtime, when a request is complete and the caller want to construct a log line. The above element array will be iterated in order.
The raw JSON piece string will be put into output buffer directly. And the formatters will be evaluated. The outputs of formatters will be sanitized/serialized to legal JSON values and then be put into output buffer.
At the runtime (core data paths), the logic of new JsonFormatter is almost same with the text formatter. By this way, the JsonFormatter could get comparable/same performance with the text formatter.
The performance result of the formatter.
The previous json formatter is too too slow and the new one perform much better (16x-26x faster).
Risk Level: low. guard by the runtime flag.
Testing: unit.
Docs Changes: n/a.
Release Notes: added.
Platform Specific Features: n/a.