Skip to content

Conversation

jmarantz
Copy link
Contributor

@jmarantz jmarantz commented Apr 2, 2022

Commit Message: Creates a JSON sanitizer that delegates to the nlohmann library when there's any escapes required, or characters with the 8th bit set, but otherwise just returns its input.

This PR supersedes #20428 which has a faster hand-rolled sanitizer, but the speed penalty for nlohmann is not too bad, compared with the protobuf sanitizer, and will generally be rarely needed anyway.

--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_ProtoEncoderNoEscape         1445 ns         1444 ns       455727
BM_NlohmannNoEscape             9.79 ns         9.79 ns     71449511
BM_ProtoEncoderWithEscape       1521 ns         1521 ns       462697
BM_NlohmannWithEscape            215 ns          215 ns      3264218

Additional Description: This is intended to be adapted into the JSON renderer introduced in #19693
Risk Level: low -- this is layered on top of an already well-trusted library.
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

jmarantz added 30 commits March 20, 2022 00:01
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…lasting with memset.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ze one char at a time.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…n tests.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…hat protobufs reject.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Contributor Author

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the review!

}
}
if (buffer.size() >= 2 && buffer[0] == '"' && buffer[buffer.size() - 1] == '"') {
return absl::string_view(buffer).substr(1, buffer.size() - 2);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I was able to just remove that block.

if (str.size() >= 2 && str[0] == '"' && str[str.size() - 1] == '"') {
str = str.substr(1, str.size() - 2);
} else {
IS_ENVOY_BUG(absl::StrCat("stripDoubleQuotes called on a str that lacks double-quotes: ", str));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm still not sure when to use ENVOY_BUG. I'll switch to ASSERT.

# genrule(
# name = "protobuf_incompatible_unicodes",
# testonly = 1,
# outs = ["protobuf_incompatible_unicodes.h"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really. I thought it would be nicer to generate the invalidations all the time so no one has to think about this again, but I could not get it to work.

namespace {

// Collects unicode values that cannot be handled by the protobuf json encoder.
// This is not needed for correct operation of the json sanitizer, but it is
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done -- in comments and messages; did not change variable/function/namespace names.

@repokitteh-read-only repokitteh-read-only bot added deps Approval required for changes to Envoy's external dependencies and removed waiting labels Apr 4, 2022
@repokitteh-read-only
Copy link

CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).
envoyproxy/dependency-shepherds assignee is @htuch

🐱

Caused by: #20637 was synchronize by jmarantz.

see: more, trace.

@moderation
Copy link
Contributor

This is a comment only change to the dependencies

/lgtm deps

@repokitteh-read-only repokitteh-read-only bot removed the deps Approval required for changes to Envoy's external dependencies label Apr 4, 2022
absl::string_view sanitize(std::string& buffer, absl::string_view str);

/**
* Strips double-quotes on first and last characters of str. It's an ENVOY_BUG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ENVOY_BUG/precondition ../

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// "-c opt" is recommended because otherwise it takes almost a minute to run
// through all 4-byte UTF-8 sequences.
//
// Running in this mode causes two tests to fail, but prints two initialization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the failure if possible? What I'm worried about is when the person upgrading protobuf library sees 3 tests fail or something and is uncertain on how to proceed. I think it's fine to have a rerun on each upgrade as needed, but it should be pretty bullet proof, right now it feels a bit wing-and-a-prayer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry that "test to fail" note is obsolete and I removed it. I think it's reasonably OK given the minefield of correctness issues from the protobuf library.

I am not fully convinced it's worth fuzzing against protobuf serialization forever. That library should not be considered the golden truth. But for now it makes sense.

If eventually we decide we need the performance of the handrolled version I think it would make sense to fuzz against nlohmann.

};

bool isInvalidProtobufSerialization(const std::string& str) {
return (str.size() == 2 && str[0] == '"' && str[1] == '"') || str.size() > 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow these constraints, comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments:

  // To work around https://github.com/protocolbuffers/protobuf/issues/9729
  // we must examine its output for each unicode. The protobuf library's
  // invalid outputs take two forms:
  //   ""            (empty qutoed string, which happens on 3-byte utf8)
  //   \u1234\u5678  (two consecutive unicode escapes for a 4-byte utf8)

auto [unicode, consumed] = Utf8::decode(utf8);
if (consumed != 0 && unicode == escaped_unicode) {
utf8 = utf8.substr(consumed, utf8.size() - consumed);
escaped = escaped.substr(6, escaped.size() - 6);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic 6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made constants.

jmarantz added 2 commits April 5, 2022 09:18
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Contributor Author

jmarantz commented Apr 5, 2022

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #20637 (comment) was created by @jmarantz.

see: more, trace.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Apr 6, 2022
@jmarantz
Copy link
Contributor Author

jmarantz commented Apr 7, 2022

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #20637 (comment) was created by @jmarantz.

see: more, trace.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Very thorough fuzzing/testing, which I think provides the confidence to make this switch.

@jmarantz before merging, can I check that this only impacts admin and we don't need runtime protection?

@jmarantz
Copy link
Contributor Author

jmarantz commented Apr 8, 2022

@htuch by "Can I check" did you mean you or me :) ?

This PR does not call Json::sanitize() from anywhere except tests. However I'll write another small PR to switch the admin stats serializer to call it.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm deps

@repokitteh-read-only repokitteh-read-only bot removed the deps Approval required for changes to Envoy's external dependencies label Apr 8, 2022
@htuch htuch merged commit 1c66d57 into envoyproxy:main Apr 8, 2022
@jmarantz jmarantz deleted the json-nlohmann-sanitizer branch April 8, 2022 14:51
jmarantz added a commit that referenced this pull request Apr 12, 2022
…eadouts (#20745)

Commit Message: Uses the new Json::sanitizer from #20637 to generate counters, gauges, and text-readouts in response to admin endpoint `/stats?format=json`, resulting in a 5.6x speedup for AllCountersJson, and making JSON output only around 10% slower than text output.

Additional Description:
Before:
```
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_AllCountersText             454 ms          453 ms            2
BM_UsedCountersText           35.8 ms         35.8 ms           20
BM_FilteredCountersText       1806 ms         1805 ms            1
BM_AllCountersJson            2824 ms         2823 ms            1
BM_UsedCountersJson           61.8 ms         61.8 ms           11
BM_FilteredCountersJson       1798 ms         1798 ms            1
```
after
```
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_AllCountersText             458 ms          458 ms            2
BM_UsedCountersText           36.1 ms         36.1 ms           19
BM_FilteredCountersText       1790 ms         1790 ms            1
BM_AllCountersJson             496 ms          496 ms            2
BM_UsedCountersJson           36.4 ms         36.4 ms           19
BM_FilteredCountersJson       1789 ms         1789 ms            1
```

Risk Level: low -- Json::sanitizer is new, but based on nlohmann, and is differentially fuzzed vs protobuf serialization
Testing: //test/server/admin/...
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a


Signed-off-by: Joshua Marantz <jmarantz@google.com>
ogle.com>
vehre-x41 pushed a commit to vehre-x41/envoy that referenced this pull request Apr 19, 2022
…eadouts (envoyproxy#20745)

Commit Message: Uses the new Json::sanitizer from envoyproxy#20637 to generate counters, gauges, and text-readouts in response to admin endpoint `/stats?format=json`, resulting in a 5.6x speedup for AllCountersJson, and making JSON output only around 10% slower than text output.

Additional Description:
Before:
```
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_AllCountersText             454 ms          453 ms            2
BM_UsedCountersText           35.8 ms         35.8 ms           20
BM_FilteredCountersText       1806 ms         1805 ms            1
BM_AllCountersJson            2824 ms         2823 ms            1
BM_UsedCountersJson           61.8 ms         61.8 ms           11
BM_FilteredCountersJson       1798 ms         1798 ms            1
```
after
```
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_AllCountersText             458 ms          458 ms            2
BM_UsedCountersText           36.1 ms         36.1 ms           19
BM_FilteredCountersText       1790 ms         1790 ms            1
BM_AllCountersJson             496 ms          496 ms            2
BM_UsedCountersJson           36.4 ms         36.4 ms           19
BM_FilteredCountersJson       1789 ms         1789 ms            1
```

Risk Level: low -- Json::sanitizer is new, but based on nlohmann, and is differentially fuzzed vs protobuf serialization
Testing: //test/server/admin/...
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a


Signed-off-by: Joshua Marantz <jmarantz@google.com>
ogle.com>
Signed-off-by: Andre Vehreschild <vehre@x41-dsec.de>
ravenblackx pushed a commit to ravenblackx/envoy that referenced this pull request Jun 8, 2022
Creates a JSON sanitizer that delegates to the nlohmann library when there's any escapes required, or characters with the 8th bit set, but otherwise just returns its input.

This PR supersedes envoyproxy#20428 which has a faster hand-rolled sanitizer, but the speed penalty for nlohmann is not too bad, compared with the protobuf sanitizer, and will generally be rarely needed anyway.

--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
BM_ProtoEncoderNoEscape         1445 ns         1444 ns       455727
BM_NlohmannNoEscape             9.79 ns         9.79 ns     71449511
BM_ProtoEncoderWithEscape       1521 ns         1521 ns       462697
BM_NlohmannWithEscape            215 ns          215 ns      3264218
Additional Description: This is intended to be adapted into the JSON renderer introduced in envoyproxy#19693
Risk Level: low -- this is layered on top of an already well-trusted library.
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

Signed-off-by: Joshua Marantz <jmarantz@google.com>
ravenblackx pushed a commit to ravenblackx/envoy that referenced this pull request Jun 8, 2022
…eadouts (envoyproxy#20745)

Commit Message: Uses the new Json::sanitizer from envoyproxy#20637 to generate counters, gauges, and text-readouts in response to admin endpoint `/stats?format=json`, resulting in a 5.6x speedup for AllCountersJson, and making JSON output only around 10% slower than text output.

Additional Description:
Before:
```
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_AllCountersText             454 ms          453 ms            2
BM_UsedCountersText           35.8 ms         35.8 ms           20
BM_FilteredCountersText       1806 ms         1805 ms            1
BM_AllCountersJson            2824 ms         2823 ms            1
BM_UsedCountersJson           61.8 ms         61.8 ms           11
BM_FilteredCountersJson       1798 ms         1798 ms            1
```
after
```
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_AllCountersText             458 ms          458 ms            2
BM_UsedCountersText           36.1 ms         36.1 ms           19
BM_FilteredCountersText       1790 ms         1790 ms            1
BM_AllCountersJson             496 ms          496 ms            2
BM_UsedCountersJson           36.4 ms         36.4 ms           19
BM_FilteredCountersJson       1789 ms         1789 ms            1
```

Risk Level: low -- Json::sanitizer is new, but based on nlohmann, and is differentially fuzzed vs protobuf serialization
Testing: //test/server/admin/...
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a


Signed-off-by: Joshua Marantz <jmarantz@google.com>
ogle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants