Skip to content

Conversation

josibake
Copy link
Member

@josibake josibake commented Aug 15, 2025

Opened in response to #1698 (comment)


We use tagged hashes in modules/musig, modules/schnorrsig, modules/ellswift, and the proposed modules/silentpayments. In looking for inspiration on how to add tagged hash midstate verification for #1698, it seemed like a good opportunity to DRY up the code across all of the modules.

I chose the convention used in the ellswift module as this seems the most idiomatic C. Since the tags are normally specified as strings in the BIPs, I also added a comment above each char array for convenience.

If its deemed too invasive to refactor the existing modules in this PR, I'm happy to drop the refactor commits for the ellswift and schnorrsig modules. All I need for #1698 is the first commit which moves the utility function out of the musig module to make it available to use in the silent payments module.

test_sha256_eq(&sha, &sha_optimized);
secp256k1_sha256 sha_optimized;
{
unsigned char tag[] = "secp256k1_ellswift_encode";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8a17983:

fa67b67 can be relevant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! The commit message states "However, it requires exactly specifying the array size, which can be
cumbersome," but I don't think this is true.

Using the test program:

// repro.c
#include <stdio.h>

int main() {
    char str[] = "hello world";  // This should trigger the warning
    printf("%s\n", str);
    return 0;
}

I am able to compile with gcc14:

nix-shell --expr 'with import <nixpkgs> {}; mkShell.override { stdenv = overrideCC stdenv gcc14; }'
gcc -v
gcc -Wall -Wextra -Wpedantic -Werror repro.c -o out

and able to compile with gcc15:

nix-shell --expr 'with import <nixpkgs> {}; mkShell.override { stdenv = overrideCC stdenv gcc15; }'
gcc -v
gcc -Wall -Wextra -Wpedantic -Werror repro.c -o out

However, if I specify the array size, I can reproduce the error:

// repro.c
#include <stdio.h>

int main() {
    char str[11] = "hello world";  // This should trigger the warning
    printf("%s\n", str);
    return 0;
}

No error with:

nix-shell --expr 'with import <nixpkgs> {}; mkShell.override { stdenv = overrideCC stdenv gcc14; }'
gcc -Wall -Wextra -Wpedantic -Werror repro.c -o out

And an error with:

nix-shell --expr 'with import <nixpkgs> {}; mkShell.override { stdenv = overrideCC stdenv gcc15; }'
gcc -Wall -Wextra -Wpedantic -Werror repro.c -o out

repro.c: In function ‘main’:
repro.c:4:20: error: initializer-string for array of ‘char’ truncates NUL terminator but destination lacks ‘nonstring’ attribute (12 chars into 11 available) [-Werror=unterminated-string-initialization]
    4 |     char str[11] = "hello world";  // This should trigger the warning
      |                    ^~~~~~~~~~~~~
cc1: all warnings being treated as errors

Based on the above, I'd recommend we prefer the approach in this PR of not specifying the array size and perhaps document it as the preferred convention going forward? I find being able to specify the tag as a string to be much more reviewable than specifying the tag as an array of characters.

That being said, also happy to go the other way and update the musig tests to match the other modules if thats the preferred convention, as I think the main benefit is to have all of the modules follow the same convention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To convince myself, I also verified with a few versions of clang, e.g.,:

nix-shell --expr 'with import <nixpkgs> {}; mkShell.override { stdenv = llvmPackages_16.stdenv; }'
clang -Wall -Wextra -Wpedantic -Werror -Wmost repro.c

Copy link
Contributor

@real-or-random real-or-random Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josibake The NUL byte resulting from char str[] = "hello world" does not hurt per se, but there are two minor issues with this: First, it's conceptually the wrong thing: If we want a char array, the simplest thing to do is to define a char array instead of a NUL-terminated string. Second and probably more relevant, it changes sizeof(str) to be 12 instead of 11. (See https://godbolt.org/z/da6PExKTh for demonstration. godbolt.org is the easiest way to test toy examples on many compilers.) We could, of course, accept this and always use sizeof(str) - 1, but it's easy to miss this.

edit: Sorry, I now saw that you're aware of the - 1 thing. And I agree, the ability to grep for the string is a good argument for the NUL-terminated string. If you ask me, I prefer to forego the grepability and define the right kind of object and have sizeof correct. But there's no definitive answer in the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@real-or-random thanks for the context! That explains the sizeof(str) - 1 for the musig examples. So it seems the choices are:

  1. Do something conceptually wrong for something that is slightly easier to review
  2. Do the conceptually correct thing for something that is slightly harder to review

"Slightly harder/easier" is a bit hand-wavy, but the fact that we used to specify the tags as strings (and the recently added musig also adopted this convention vs staying consistent with the existing modules) indicates option 1 is the more natural option. However, it likely needs an explainer, especially for why we are using sizeof(tag) - 1. On the flipside, I'm guessing option 2 feels more natural for reviewers who review/write a majority of the time in C?

Regardless of which convention is chosen, I do think its worth documenting in CONTRIBUTING.md. I'll add a commit for that once reviewers have weighed in on which convention they prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe godbolt.org/z/eKbT6sha4?

That still generates a warning if I add -Wextra.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe godbolt.org/z/eKbT6sha4?

That still generates a warning if I add -Wextra.

Right.

https://godbolt.org/z/n5rf5Y7cP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting, I wasn't aware of nonstring. That's another neat way.

Though when I think about it, I still prefer {'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'}. Code is read much more often than it's written, so it makes sense to optimize reader (or reviewer) burden, and {'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'} is immediately clear to a reviewer familiar with C. It's just a bit hard on the eyes, but there will be no need to look up macros or GNU extension attributes, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though when I think about it, I still prefer {'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'}. Code is read much more often than it's written, so it makes sense to optimize reader (or reviewer) burden, and {'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'} is immediately clear to a reviewer familiar with C. It's just a bit hard on the eyes, but there will be no need to look up macros or GNU extension attributes, etc.

Agreed. That's why I raised this point in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like 2 votes for keeping it as is, vs one vote to change it 😅 I'll update this PR tomorrow to instead convert the musig module to the existing convention, and add a note documenting the convention.

Copy link
Contributor

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK it's a good idea to make this consistent

@theStack
Copy link
Contributor

Concept ACK

In the risk of sounding heretic, wouldn't it also be an option to let sha256_tag_test_internal simply take a string and compute the tag length at run-time via strlen (it's test-only code anyways...), in order to avoid having to declare char arrays and deal with specifying the correct lengths repeatedly in the first place? I'd be very surprised at least if future BIP authors would break the tradition and ever use tags that include NUL-bytes. Happy to review either variant, of course (also, obviously feel free to just ignore, since there has been a good amount of discussion already).

@real-or-random
Copy link
Contributor

In the risk of sounding heretic, wouldn't it also be an option to let sha256_tag_test_internal simply take a string and compute the tag length at run-time via strlen (it's test-only code anyways...),

Hehe, I think that's also a good approach. It increases legibility at the cost of introducing the assumption that there are NUL bytes (which is most likely true even for future tags, yes). If I had to pick, I'd still pack the array initializer simply because the tag is conceptually an array.

I think we have reached a point where @josibake should just pick one of the many good options, and we'll move on with that one. 😄

@josibake josibake force-pushed the tagged-hash-test-util branch from 6424805 to 17af09d Compare August 20, 2025 08:15
@josibake
Copy link
Member Author

Thanks everyone for chiming in! I reworked this to update the musig tests to use static const unsigned char arrays and refactored the existing tests to use the sha256_tag_test_internal function. I think @real-or-random made some compelling arguments for this approach, namely:

{'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'} is immediately clear to a reviewer familiar with C. It's just a bit hard on the eyes, but there will be no need to look up macros or GNU extension attributes, etc.

Given that this library is written in C, it seems best to write code that is familiar to reviewers and is idiomatic C.

If I had to pick, I'd still pack the array initializer simply because the tag is conceptually an array.

Agree. Though we can represent tags as strings, ultimately they are character arrays. Creating them as char arrays seems to have the least surprises, e.g., sizeof works as expected. I still think its nice to have a string representation of the tag in the code, so I added a comment above each char array.

Lastly, I decided against adding a blurb to CONTRIBUTING.md. I think "New code should adhere to the style of existing, in particular surrounding, code.." is sufficient, and I expect new tagged hashes to be infrequent. Happy to add a documentation commit, however, if others feel it warrants a blurb in CONTRIBUTING.md.

@real-or-random
Copy link
Contributor

Lastly, I decided against adding a blurb to CONTRIBUTING.md. I think "New code should adhere to the style of existing, in particular surrounding, code.." is sufficient, and I expect new tagged hashes to be infrequent. Happy to add a documentation commit, however, if others feel it warrants a blurb in CONTRIBUTING.md.

Agreed, this is too much of a niche thing to bother with in this file. Of course, it won't hurt if it's documented there, but then we could also document hundreds of other things in CONTRIBUTING.md.

Copy link
Contributor

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK mod nit, you could also squash these commits

Move the sha256_tag_test_internal function out of the musig module
into tests.c. This makes it available to other modules wishing to verify tagged
hashes without needing to duplicate the function.

Change the function signature to expect a const unsigned char and update
the tagged hash tests to use static const unsigned char character
arrays (where necessary).

Add a comment for each tag. This is done as a convenience for checking
the strings against the protocol specifications, where the tags are
normally specified as strings.

Update tests in the ellswift and schnorrsig modules to use the
sha256_tag_test_internal helper function.
@josibake josibake force-pushed the tagged-hash-test-util branch from 17af09d to 5153cf1 Compare August 20, 2025 09:42
@josibake
Copy link
Member Author

Renamed helper function to test_sha256_tag_midstate and squashed the commits (h/t @real-or-random )

Copy link
Contributor

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 5153cf1 assuming CI passes

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-review ACK 5153cf1

@real-or-random real-or-random merged commit f36afb8 into bitcoin-core:master Aug 21, 2025
116 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants