fix: allows all non-'}' characters as format values #2065

Insomniak47 · 2024-05-24T01:23:12Z

Hey it's me again!

The existing implementation used .IsPunctuation + IsLetterOrDigit with an override for '+' and ' '. That left some conspicuous holes and the spec itself is just "^}" so basically anything goes in a format string. This allows for them and this does seem to have a reasonably large impact on the DefaultConsoleOutputTemplate parsing method. Everything else was within error bounds and no impact at all on memory allocs.

Some examples of oddities was that @ was allowed but $ wasn't and + was allowed but >, <, = weren't

for posterity this is based on our conversation here: messagetemplates/grammar#6

BEFORE - Allocs

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
LogEmpty	2.527 ns	0.0247 ns	0.0231 ns	1.00	0.00	-	-	NA
LogEmptyWithEnricher	11.869 ns	0.0309 ns	0.0258 ns	4.70	0.04	-	-	NA
LogScalar	157.863 ns	3.0242 ns	3.6001 ns	62.26	1.72	0.0215	360 B	NA
LogDictionary	1,229.274 ns	24.3282 ns	31.6336 ns	489.18	14.38	0.1297	2192 B	NA
LogSequence	251.133 ns	4.8599 ns	6.3192 ns	99.78	2.60	0.0343	576 B	NA
LogAnonymous	1,630.629 ns	32.4170 ns	31.8378 ns	645.87	13.55	0.1717	2896 B	NA
FormatMessage	42.957 ns	0.2456 ns	0.2297 ns	17.00	0.18	-	-	NA

BEFORE - Parsing

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
EmptyTemplate	56.11 ns	1.080 ns	1.110 ns	1.00	0.00	0.0138	232 B	1.00
DefaultConsoleOutputTemplate	534.43 ns	10.624 ns	20.468 ns	9.53	0.32	0.1030	1736 B	7.48
LiteralTextOnly	67.79 ns	1.381 ns	2.229 ns	1.21	0.04	0.0172	288 B	1.24

AFTER - Allocs

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
LogEmpty	2.629 ns	0.0130 ns	0.0115 ns	1.00	0.00	-	-	NA
LogEmptyWithEnricher	14.431 ns	0.0369 ns	0.0345 ns	5.49	0.02	-	-	NA
LogScalar	152.972 ns	3.0344 ns	3.9455 ns	58.25	1.54	0.0215	360 B	NA
LogDictionary	1,215.807 ns	24.0812 ns	24.7296 ns	461.82	10.03	0.1297	2192 B	NA
LogSequence	236.164 ns	4.5692 ns	4.4876 ns	89.90	1.70	0.0343	576 B	NA
LogAnonymous	1,636.475 ns	27.9450 ns	26.1398 ns	623.24	11.09	0.1717	2896 B	NA
FormatMessage	42.094 ns	0.1604 ns	0.1340 ns	16.01	0.05	-	-	NA

AFTER - PARSING

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Allocated	Alloc Ratio
EmptyTemplate	55.05 ns	1.122 ns	1.292 ns	1.00	0.00	0.0138	-	232 B	1.00
DefaultConsoleOutputTemplate	475.19 ns	9.331 ns	12.457 ns	8.67	0.35	0.1035	0.0005	1736 B	7.48
LiteralTextOnly	66.30 ns	1.357 ns	2.447 ns	1.22	0.05	0.0172	-	288 B	1.24

Change in allowed printables before and after:

$ - Before: False -> After True
< - Before: False -> After True
= - Before: False -> After True
> - Before: False -> After True
^ - Before: False -> After True
` - Before: False -> After True
| - Before: False -> After True
~ - Before: False -> After True

The existing implementation used .IsPunctuation + IsLetterOrDigit with an override for '+' and ' '. That left some conspicuous holes and the spec itself is just "^\}" so basically anything goes in a format string. This allows for them and this does seem to have a reasonably large impact on the DefaultConsoleOutputTemplate parsing method. Everything else was within error bounds and no impact at all on memory allocs.

nblumhardt · 2024-05-24T05:02:19Z

Thanks for checking this out!

I think we might end up just wanting c != '{', I don't think there's anything that accurately restricts what's valid in a format string.

Console.WriteLine(char.IsLetterOrDigit('む'));
-> True

(So previously :む was a valid format, while if we switched to the ASCII character range comparison it wouldn't be.)

Insomniak47 · 2024-05-24T12:57:54Z

Thanks for checking this out!

I think we might end up just wanting c != '{', I don't think there's anything that accurately restricts what's valid in a format string.
Console.WriteLine(char.IsLetterOrDigit('む'));
-> True
(So previously :む was a valid format, while if we switched to the ASCII character range comparison it wouldn't be.)

That seems reasonable. Though I'm assuming control characters/non-printables/linebreaks? Should it be maybe anything not in these character classes?

UnicodeCategory.Control
UnicodeCategory.Format
UnicodeCategory.ModifierSymbol
UnicodeCategory.OtherSymbol
UnicodeCategory.OtherNotAssigned

EDIT:
The main concern I think is weirdness in the character classes but also if someone puts a newline or a NBSP in a formatting string.. That kinda feels like it's on them 😬

nblumhardt · 2024-05-24T21:08:50Z

Yeah, I don't think it's something that requires policing by Serilog, since the rest of the message template (outside the holes) also supports arbitrary escapes/control characters.

.NET accepts just about anything in format strings, and it seems like C#'s string interpolation follows suit.

Console.WriteLine($"Hello {new object():\0}");
-> Hello System.Object

Insomniak47 · 2024-05-25T13:52:12Z

.NET accepts just about anything in format strings, and it seems like C#'s string interpolation follows suit.

Interesting - is there a place where this is doc'd in dotnet or is it just "experiment and find out" - I went hunting for what was allowed but got lazy 😄

I'll make that swap now though!

Insomniak47 · 2024-05-25T14:30:48Z

New benches, interestingly the conversion to bytes vs as chars saves about 1/8 the comparison. Since we don't really care about char specific comparisons to } it's a nice free win and it's still safe. I checked that it wasn't just the reduction in conditionals two chars for me reliably increases the time

Edit: After a few more attempts seems like the perf diffs with the byte cast may be smaller and this might just be noisy data - I'm not running this on a dedicated benching box I'm just running it on my desktop while I'm running other things so might not be the best dataset. That being said this does end up being about 55ns faster than baseline

New benches:

BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22631.3593)
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=8.0.205
  [Host]     : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
LogEmpty	2.548 ns	0.0278 ns	0.0260 ns	1.00	0.00	-	-	NA
LogEmptyWithEnricher	14.073 ns	0.1440 ns	0.1347 ns	5.52	0.08	-	-	NA
LogScalar	157.720 ns	3.1802 ns	5.6528 ns	61.81	2.43	0.0215	360 B	NA
LogDictionary	1,242.702 ns	24.6716 ns	40.5362 ns	488.14	14.40	0.1297	2192 B	NA
LogSequence	253.055 ns	6.4325 ns	17.9312 ns	113.93	2.01	0.0343	576 B	NA
LogAnonymous	1,600.773 ns	28.9116 ns	27.0439 ns	628.26	11.44	0.1717	2896 B	NA
FormatMessage	42.532 ns	0.2978 ns	0.2786 ns	16.69	0.18	-	-	NA

BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22631.3593)
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=8.0.205
  [Host]     : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
EmptyTemplate	56.26 ns	1.114 ns	1.368 ns	1.00	0.00	0.0138	232 B	1.00
DefaultConsoleOutputTemplate	481.26 ns	9.588 ns	21.836 ns	8.82	0.44	0.1030	1736 B	7.48
LiteralTextOnly	68.11 ns	1.384 ns	2.666 ns	1.23	0.05	0.0172	288 B	1.24

nblumhardt · 2024-05-27T05:13:06Z

src/Serilog/Parsing/MessageTemplateParser.cs

-                char.IsPunctuation(c) ||
-                c is ' ' or '+');
-    }
+    static bool IsValidInFormat(char c) => c != (byte)'}';


The byte conversion difference here is just noise; both versions end up emitting the same assembly (using godbolt.org for a quick check).

I'll drop out the cast and merge this so it's all ready for v4 :-) 👍

Wicked, on 500ish ns noise ends up so impactful :P

fix: only disallows '}' in format strings, use byte comparison

043f7c3

Insomniak47 changed the title ~~fix: allows all printable characters as format values~~ fix: allows all non-'}' characters as format values May 25, 2024

nblumhardt reviewed May 27, 2024

View reviewed changes

Remove noop byte cast

27d87c0

nblumhardt approved these changes May 27, 2024

View reviewed changes

nblumhardt merged commit 4ef3a87 into serilog:dev May 27, 2024

nblumhardt mentioned this pull request May 27, 2024

4.0.0 Release #2067

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: allows all non-'}' characters as format values #2065

fix: allows all non-'}' characters as format values #2065

Uh oh!

Insomniak47 commented May 24, 2024 •

edited

Loading

Uh oh!

nblumhardt commented May 24, 2024 •

edited

Loading

Uh oh!

Insomniak47 commented May 24, 2024 •

edited

Loading

Uh oh!

nblumhardt commented May 24, 2024

Uh oh!

Insomniak47 commented May 25, 2024 •

edited

Loading

Uh oh!

Insomniak47 commented May 25, 2024 •

edited

Loading

Uh oh!

nblumhardt May 27, 2024

Uh oh!

Insomniak47 May 28, 2024

Uh oh!

Uh oh!

fix: allows all non-'}' characters as format values #2065

fix: allows all non-'}' characters as format values #2065

Uh oh!

Conversation

Insomniak47 commented May 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

BEFORE - Allocs

BEFORE - Parsing

AFTER - Allocs

AFTER - PARSING

Uh oh!

nblumhardt commented May 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Insomniak47 commented May 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nblumhardt commented May 24, 2024

Uh oh!

Insomniak47 commented May 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Insomniak47 commented May 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nblumhardt May 27, 2024

Choose a reason for hiding this comment

Uh oh!

Insomniak47 May 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Insomniak47 commented May 24, 2024 •

edited

Loading

nblumhardt commented May 24, 2024 •

edited

Loading

Insomniak47 commented May 24, 2024 •

edited

Loading

Insomniak47 commented May 25, 2024 •

edited

Loading

Insomniak47 commented May 25, 2024 •

edited

Loading