Skip to content

Conversation

jhump
Copy link
Owner

@jhump jhump commented Aug 17, 2022

This is similar to the last PR (#523), but for constraints that protoc enforces for enum value names.

Admittedly, these are a little bit weirder: not only is it not case-sensitive, it also uses camel-case (i.e. strips underscores) even though the JSON format for enum values does not use camel-case. 🤷

Just like protoc, these are only warnings in proto2 syntax (since the JSON format was introduced with proto3, making these extra constraints backwards-incompatible). But they are compile errors in proto3 syntax.

Also, I found a bug in my last PR: it wouldn't actually check fields that were in messages nested inside of other messages. (The recursion failed to go that deep.) So this includes a fix for that as well as a couple of extra test cases to prove it works.

Fixes #439.

Comment on lines +1141 to +1145
for _, nmd := range md.GetNestedMessageTypes() {
if err := l.checkJsonNamesInMessage(nmd, res); err != nil {
return err
}
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. This was missing in the last PR.

Comment on lines +1020 to +1030
map[string]string{
"foo.proto": "syntax = \"proto3\";\n" +
"message Blah {\n" +
" message Foo {\n" +
" string foo = 1;\n" +
" string bar = 2 [json_name=\"foo\"];\n" +
" }\n" +
"}\n",
},
"foo.proto:5:5: field Foo.bar: custom JSON name \"foo\" conflicts with default JSON name of field foo, defined at foo.proto:4:5",
}, {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and a couple others below) are new tests to verify that messages nested in other messages are also correctly checked.

@jhump
Copy link
Owner Author

jhump commented Aug 17, 2022

Admittedly, these are a little bit weirder: not only is it not case-sensitive, it also uses camel-case (i.e. strips underscores) even though the JSON format for enum values does not use camel-case. 🤷

@amckinney, I had a total brain fart. I don't know why I though this. I've been toying with examples with protoc for a bit and could have sworn I saw this behavior. But I was apparently mistaken: protoc doesn't do the camel-case thing -- it just does the case-insensitive check (and will strip the enum name prefix if present). Doh!

Edit: I just realized why I thought it was doing camel-case -- it's because of the way it decides whether the name is prefixed with the enum name. It ignores leading underscores as well as any underscores in between the enum name and the rest of the name. But it preserves underscores after that. So in an enum named "Foo", the names BAR_BAZ and ____foo___bar_baz conflict, but they do not conflict with "_FOO_BarBaz" (since it has no underscore between the "bar" and "baz"). The behavior of protoc here is just... weird...

@jhump
Copy link
Owner Author

jhump commented Aug 17, 2022

Good grief. The actual logic is terribly confusing and doesn't really match up with the error message. The error message just says "ignoring case", but they do actually convert to camel-case (or pascal-case rather) and they don't ignore case. 🤦
https://github.com/protocolbuffers/protobuf/blob/v21.3/src/google/protobuf/descriptor.cc#L5945-L5946
https://github.com/protocolbuffers/protobuf/blob/v21.3/src/google/protobuf/descriptor.cc#L922
https://github.com/protocolbuffers/protobuf/blob/v21.3/src/google/protobuf/descriptor.cc#L887

What a pain to reproduce... and so hard to accurately describe in an error message. 😦

Comment on lines -230 to -231
if i == 0 {
js = append(js, r)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. Looks like I may have accidentally been looking at ToCamelCase where lower_first param was true, instead of at ToJsonName in the protoc codebase (reference).

Luckily, this bit was basically a no-op: it appears to be trying to skipping the capitalization of the first rune, but if the first rune of the result were to be capitalized, it means that the first rune in name is actually _, so this was basically a no-op since nextUpper could never be true when i == 0...

I've added some fields to a test proto (which gets compared to protoc output) to make sure that we are handling weird cases correctly, such as field that starts with _ and... a field whose whole name is _ (in which case the json_name attribute is the empty string! 😨).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow.. great find. I am just so sad that we have to maintain all of this nonsense.

ccName = ccName[len(lcaseEnumName):]
}
if existing, ok := seen[lcaseName]; ok && evd.GetNumber() != existing.source.GetNumber() {
name := canonicalEnumValueName(evd.GetName(), ed.GetName())
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After playing with a few extra edge cases with protoc, I realized that the algorithm was not quite what I had formulated here...

So I had to go look at the protoc C++ source code to reproduce it properly.

@jhump jhump merged commit ab4615d into master Aug 17, 2022
@jhump jhump deleted the jh/check-enum-val-name-conflicts branch August 17, 2022 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

protoparse: ParseFiles allows case insensitive duplicate enum values
2 participants