-
Notifications
You must be signed in to change notification settings - Fork 5.1k
admin: Json nlohmann sanitizer #20637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…lasting with memset. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ze one char at a time. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…n tests. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…hat protobufs reject. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the review!
source/common/json/json_sanitizer.cc
Outdated
} | ||
} | ||
if (buffer.size() >= 2 && buffer[0] == '"' && buffer[buffer.size() - 1] == '"') { | ||
return absl::string_view(buffer).substr(1, buffer.size() - 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I was able to just remove that block.
source/common/json/json_sanitizer.cc
Outdated
if (str.size() >= 2 && str[0] == '"' && str[str.size() - 1] == '"') { | ||
str = str.substr(1, str.size() - 2); | ||
} else { | ||
IS_ENVOY_BUG(absl::StrCat("stripDoubleQuotes called on a str that lacks double-quotes: ", str)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm still not sure when to use ENVOY_BUG. I'll switch to ASSERT.
test/common/json/BUILD
Outdated
# genrule( | ||
# name = "protobuf_incompatible_unicodes", | ||
# testonly = 1, | ||
# outs = ["protobuf_incompatible_unicodes.h"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really. I thought it would be nicer to generate the invalidations all the time so no one has to think about this again, but I could not get it to work.
namespace { | ||
|
||
// Collects unicode values that cannot be handled by the protobuf json encoder. | ||
// This is not needed for correct operation of the json sanitizer, but it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done -- in comments and messages; did not change variable/function/namespace names.
CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to |
This is a comment only change to the dependencies /lgtm deps |
source/common/json/json_sanitizer.h
Outdated
absl::string_view sanitize(std::string& buffer, absl::string_view str); | ||
|
||
/** | ||
* Strips double-quotes on first and last characters of str. It's an ENVOY_BUG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ENVOY_BUG/precondition ../
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// "-c opt" is recommended because otherwise it takes almost a minute to run | ||
// through all 4-byte UTF-8 sequences. | ||
// | ||
// Running in this mode causes two tests to fail, but prints two initialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid the failure if possible? What I'm worried about is when the person upgrading protobuf library sees 3 tests fail or something and is uncertain on how to proceed. I think it's fine to have a rerun on each upgrade as needed, but it should be pretty bullet proof, right now it feels a bit wing-and-a-prayer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry that "test to fail" note is obsolete and I removed it. I think it's reasonably OK given the minefield of correctness issues from the protobuf library.
I am not fully convinced it's worth fuzzing against protobuf serialization forever. That library should not be considered the golden truth. But for now it makes sense.
If eventually we decide we need the performance of the handrolled version I think it would make sense to fuzz against nlohmann.
}; | ||
|
||
bool isInvalidProtobufSerialization(const std::string& str) { | ||
return (str.size() == 2 && str[0] == '"' && str[1] == '"') || str.size() > 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow these constraints, comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comments:
// To work around https://github.com/protocolbuffers/protobuf/issues/9729
// we must examine its output for each unicode. The protobuf library's
// invalid outputs take two forms:
// "" (empty qutoed string, which happens on 3-byte utf8)
// \u1234\u5678 (two consecutive unicode escapes for a 4-byte utf8)
auto [unicode, consumed] = Utf8::decode(utf8); | ||
if (consumed != 0 && unicode == escaped_unicode) { | ||
utf8 = utf8.substr(consumed, utf8.size() - consumed); | ||
escaped = escaped.substr(6, escaped.size() - 6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic 6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made constants.
Signed-off-by: Joshua Marantz <jmarantz@google.com>
/retest |
Retrying Azure Pipelines: |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
/retest |
Retrying Azure Pipelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! Very thorough fuzzing/testing, which I think provides the confidence to make this switch.
@jmarantz before merging, can I check that this only impacts admin and we don't need runtime protection?
@htuch by "Can I check" did you mean you or me :) ? This PR does not call Json::sanitize() from anywhere except tests. However I'll write another small PR to switch the admin stats serializer to call it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm deps
…eadouts (#20745) Commit Message: Uses the new Json::sanitizer from #20637 to generate counters, gauges, and text-readouts in response to admin endpoint `/stats?format=json`, resulting in a 5.6x speedup for AllCountersJson, and making JSON output only around 10% slower than text output. Additional Description: Before: ``` ------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------ BM_AllCountersText 454 ms 453 ms 2 BM_UsedCountersText 35.8 ms 35.8 ms 20 BM_FilteredCountersText 1806 ms 1805 ms 1 BM_AllCountersJson 2824 ms 2823 ms 1 BM_UsedCountersJson 61.8 ms 61.8 ms 11 BM_FilteredCountersJson 1798 ms 1798 ms 1 ``` after ``` ------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------ BM_AllCountersText 458 ms 458 ms 2 BM_UsedCountersText 36.1 ms 36.1 ms 19 BM_FilteredCountersText 1790 ms 1790 ms 1 BM_AllCountersJson 496 ms 496 ms 2 BM_UsedCountersJson 36.4 ms 36.4 ms 19 BM_FilteredCountersJson 1789 ms 1789 ms 1 ``` Risk Level: low -- Json::sanitizer is new, but based on nlohmann, and is differentially fuzzed vs protobuf serialization Testing: //test/server/admin/... Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Joshua Marantz <jmarantz@google.com> ogle.com>
…eadouts (envoyproxy#20745) Commit Message: Uses the new Json::sanitizer from envoyproxy#20637 to generate counters, gauges, and text-readouts in response to admin endpoint `/stats?format=json`, resulting in a 5.6x speedup for AllCountersJson, and making JSON output only around 10% slower than text output. Additional Description: Before: ``` ------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------ BM_AllCountersText 454 ms 453 ms 2 BM_UsedCountersText 35.8 ms 35.8 ms 20 BM_FilteredCountersText 1806 ms 1805 ms 1 BM_AllCountersJson 2824 ms 2823 ms 1 BM_UsedCountersJson 61.8 ms 61.8 ms 11 BM_FilteredCountersJson 1798 ms 1798 ms 1 ``` after ``` ------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------ BM_AllCountersText 458 ms 458 ms 2 BM_UsedCountersText 36.1 ms 36.1 ms 19 BM_FilteredCountersText 1790 ms 1790 ms 1 BM_AllCountersJson 496 ms 496 ms 2 BM_UsedCountersJson 36.4 ms 36.4 ms 19 BM_FilteredCountersJson 1789 ms 1789 ms 1 ``` Risk Level: low -- Json::sanitizer is new, but based on nlohmann, and is differentially fuzzed vs protobuf serialization Testing: //test/server/admin/... Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Joshua Marantz <jmarantz@google.com> ogle.com> Signed-off-by: Andre Vehreschild <vehre@x41-dsec.de>
Creates a JSON sanitizer that delegates to the nlohmann library when there's any escapes required, or characters with the 8th bit set, but otherwise just returns its input. This PR supersedes envoyproxy#20428 which has a faster hand-rolled sanitizer, but the speed penalty for nlohmann is not too bad, compared with the protobuf sanitizer, and will generally be rarely needed anyway. -------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------- BM_ProtoEncoderNoEscape 1445 ns 1444 ns 455727 BM_NlohmannNoEscape 9.79 ns 9.79 ns 71449511 BM_ProtoEncoderWithEscape 1521 ns 1521 ns 462697 BM_NlohmannWithEscape 215 ns 215 ns 3264218 Additional Description: This is intended to be adapted into the JSON renderer introduced in envoyproxy#19693 Risk Level: low -- this is layered on top of an already well-trusted library. Testing: //test/... Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Joshua Marantz <jmarantz@google.com>
…eadouts (envoyproxy#20745) Commit Message: Uses the new Json::sanitizer from envoyproxy#20637 to generate counters, gauges, and text-readouts in response to admin endpoint `/stats?format=json`, resulting in a 5.6x speedup for AllCountersJson, and making JSON output only around 10% slower than text output. Additional Description: Before: ``` ------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------ BM_AllCountersText 454 ms 453 ms 2 BM_UsedCountersText 35.8 ms 35.8 ms 20 BM_FilteredCountersText 1806 ms 1805 ms 1 BM_AllCountersJson 2824 ms 2823 ms 1 BM_UsedCountersJson 61.8 ms 61.8 ms 11 BM_FilteredCountersJson 1798 ms 1798 ms 1 ``` after ``` ------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------ BM_AllCountersText 458 ms 458 ms 2 BM_UsedCountersText 36.1 ms 36.1 ms 19 BM_FilteredCountersText 1790 ms 1790 ms 1 BM_AllCountersJson 496 ms 496 ms 2 BM_UsedCountersJson 36.4 ms 36.4 ms 19 BM_FilteredCountersJson 1789 ms 1789 ms 1 ``` Risk Level: low -- Json::sanitizer is new, but based on nlohmann, and is differentially fuzzed vs protobuf serialization Testing: //test/server/admin/... Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Signed-off-by: Joshua Marantz <jmarantz@google.com> ogle.com>
Commit Message: Creates a JSON sanitizer that delegates to the nlohmann library when there's any escapes required, or characters with the 8th bit set, but otherwise just returns its input.
This PR supersedes #20428 which has a faster hand-rolled sanitizer, but the speed penalty for nlohmann is not too bad, compared with the protobuf sanitizer, and will generally be rarely needed anyway.
Additional Description: This is intended to be adapted into the JSON renderer introduced in #19693
Risk Level: low -- this is layered on top of an already well-trusted library.
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a