Skip to content

Conversation

Insomniak47
Copy link
Contributor

@Insomniak47 Insomniak47 commented May 24, 2024

Hey it's me again!

The existing implementation used .IsPunctuation + IsLetterOrDigit with an override for '+' and ' '. That left some conspicuous holes and the spec itself is just "^}" so basically anything goes in a format string. This allows for them and this does seem to have a reasonably large impact on the DefaultConsoleOutputTemplate parsing method. Everything else was within error bounds and no impact at all on memory allocs.

Some examples of oddities was that @ was allowed but $ wasn't and + was allowed but >, <, = weren't

for posterity this is based on our conversation here: messagetemplates/grammar#6

BEFORE - Allocs

Method Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
LogEmpty 2.527 ns 0.0247 ns 0.0231 ns 1.00 0.00 - - NA
LogEmptyWithEnricher 11.869 ns 0.0309 ns 0.0258 ns 4.70 0.04 - - NA
LogScalar 157.863 ns 3.0242 ns 3.6001 ns 62.26 1.72 0.0215 360 B NA
LogDictionary 1,229.274 ns 24.3282 ns 31.6336 ns 489.18 14.38 0.1297 2192 B NA
LogSequence 251.133 ns 4.8599 ns 6.3192 ns 99.78 2.60 0.0343 576 B NA
LogAnonymous 1,630.629 ns 32.4170 ns 31.8378 ns 645.87 13.55 0.1717 2896 B NA
FormatMessage 42.957 ns 0.2456 ns 0.2297 ns 17.00 0.18 - - NA

BEFORE - Parsing

Method Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
EmptyTemplate 56.11 ns 1.080 ns 1.110 ns 1.00 0.00 0.0138 232 B 1.00
DefaultConsoleOutputTemplate 534.43 ns 10.624 ns 20.468 ns 9.53 0.32 0.1030 1736 B 7.48
LiteralTextOnly 67.79 ns 1.381 ns 2.229 ns 1.21 0.04 0.0172 288 B 1.24

AFTER - Allocs

Method Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
LogEmpty 2.629 ns 0.0130 ns 0.0115 ns 1.00 0.00 - - NA
LogEmptyWithEnricher 14.431 ns 0.0369 ns 0.0345 ns 5.49 0.02 - - NA
LogScalar 152.972 ns 3.0344 ns 3.9455 ns 58.25 1.54 0.0215 360 B NA
LogDictionary 1,215.807 ns 24.0812 ns 24.7296 ns 461.82 10.03 0.1297 2192 B NA
LogSequence 236.164 ns 4.5692 ns 4.4876 ns 89.90 1.70 0.0343 576 B NA
LogAnonymous 1,636.475 ns 27.9450 ns 26.1398 ns 623.24 11.09 0.1717 2896 B NA
FormatMessage 42.094 ns 0.1604 ns 0.1340 ns 16.01 0.05 - - NA

AFTER - PARSING

Method Mean Error StdDev Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
EmptyTemplate 55.05 ns 1.122 ns 1.292 ns 1.00 0.00 0.0138 - 232 B 1.00
DefaultConsoleOutputTemplate 475.19 ns 9.331 ns 12.457 ns 8.67 0.35 0.1035 0.0005 1736 B 7.48
LiteralTextOnly 66.30 ns 1.357 ns 2.447 ns 1.22 0.05 0.0172 - 288 B 1.24

Change in allowed printables before and after:

$ - Before: False -> After True
< - Before: False -> After True
= - Before: False -> After True
> - Before: False -> After True
^ - Before: False -> After True
` - Before: False -> After True
| - Before: False -> After True
~ - Before: False -> After True

The existing implementation used .IsPunctuation + IsLetterOrDigit with
an override for '+' and ' '. That left some conspicuous holes and the
spec itself is just "^\}" so basically anything goes in a format string.
This allows for them and this does seem to have a reasonably large
impact on the DefaultConsoleOutputTemplate parsing method. Everything
else was within error bounds and no impact at all on memory allocs.
@nblumhardt
Copy link
Member

nblumhardt commented May 24, 2024

Thanks for checking this out!

I think we might end up just wanting c != '{', I don't think there's anything that accurately restricts what's valid in a format string.

Console.WriteLine(char.IsLetterOrDigit('む'));
-> True

(So previously :む was a valid format, while if we switched to the ASCII character range comparison it wouldn't be.)

@Insomniak47
Copy link
Contributor Author

Insomniak47 commented May 24, 2024

Thanks for checking this out!

I think we might end up just wanting c != '{', I don't think there's anything that accurately restricts what's valid in a format string.

Console.WriteLine(char.IsLetterOrDigit('む'));
-> True

(So previously :む was a valid format, while if we switched to the ASCII character range comparison it wouldn't be.)

That seems reasonable. Though I'm assuming control characters/non-printables/linebreaks? Should it be maybe anything not in these character classes?

UnicodeCategory.Control
UnicodeCategory.Format
UnicodeCategory.ModifierSymbol
UnicodeCategory.OtherSymbol
UnicodeCategory.OtherNotAssigned

EDIT:
The main concern I think is weirdness in the character classes but also if someone puts a newline or a NBSP in a formatting string.. That kinda feels like it's on them 😬

@nblumhardt
Copy link
Member

Yeah, I don't think it's something that requires policing by Serilog, since the rest of the message template (outside the holes) also supports arbitrary escapes/control characters.

.NET accepts just about anything in format strings, and it seems like C#'s string interpolation follows suit.

Console.WriteLine($"Hello {new object():\0}");
-> Hello System.Object

@Insomniak47
Copy link
Contributor Author

Insomniak47 commented May 25, 2024

.NET accepts just about anything in format strings, and it seems like C#'s string interpolation follows suit.

Interesting - is there a place where this is doc'd in dotnet or is it just "experiment and find out" - I went hunting for what was allowed but got lazy 😄

I'll make that swap now though!

@Insomniak47
Copy link
Contributor Author

Insomniak47 commented May 25, 2024

New benches, interestingly the conversion to bytes vs as chars saves about 1/8 the comparison. Since we don't really care about char specific comparisons to } it's a nice free win and it's still safe. I checked that it wasn't just the reduction in conditionals two chars for me reliably increases the time

Edit: After a few more attempts seems like the perf diffs with the byte cast may be smaller and this might just be noisy data - I'm not running this on a dedicated benching box I'm just running it on my desktop while I'm running other things so might not be the best dataset. That being said this does end up being about 55ns faster than baseline

New benches:

BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22631.3593)
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=8.0.205
  [Host]     : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2

Method Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
LogEmpty 2.548 ns 0.0278 ns 0.0260 ns 1.00 0.00 - - NA
LogEmptyWithEnricher 14.073 ns 0.1440 ns 0.1347 ns 5.52 0.08 - - NA
LogScalar 157.720 ns 3.1802 ns 5.6528 ns 61.81 2.43 0.0215 360 B NA
LogDictionary 1,242.702 ns 24.6716 ns 40.5362 ns 488.14 14.40 0.1297 2192 B NA
LogSequence 253.055 ns 6.4325 ns 17.9312 ns 113.93 2.01 0.0343 576 B NA
LogAnonymous 1,600.773 ns 28.9116 ns 27.0439 ns 628.26 11.44 0.1717 2896 B NA
FormatMessage 42.532 ns 0.2978 ns 0.2786 ns 16.69 0.18 - - NA
BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22631.3593)
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=8.0.205
  [Host]     : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2

Method Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
EmptyTemplate 56.26 ns 1.114 ns 1.368 ns 1.00 0.00 0.0138 232 B 1.00
DefaultConsoleOutputTemplate 481.26 ns 9.588 ns 21.836 ns 8.82 0.44 0.1030 1736 B 7.48
LiteralTextOnly 68.11 ns 1.384 ns 2.666 ns 1.23 0.05 0.0172 288 B 1.24

@Insomniak47 Insomniak47 changed the title fix: allows all printable characters as format values fix: allows all non-'}' characters as format values May 25, 2024
char.IsPunctuation(c) ||
c is ' ' or '+');
}
static bool IsValidInFormat(char c) => c != (byte)'}';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The byte conversion difference here is just noise; both versions end up emitting the same assembly (using godbolt.org for a quick check).

I'll drop out the cast and merge this so it's all ready for v4 :-) 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wicked, on 500ish ns noise ends up so impactful :P

@nblumhardt nblumhardt merged commit 4ef3a87 into serilog:dev May 27, 2024
@nblumhardt nblumhardt mentioned this pull request May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants