Remove `From<hash>` for not-general-hash types #4128

tcharding · 2025-02-26T03:17:17Z

The hash_newtype macro is explicitly designed to produce a hash that is not a general purpose hash type to try and prevent users hashing arbitrary stuff with it. E.g., Txid isn't meant to be just hash arbitrary data. However we provide a From impl that will convert any instance of the inner hash type into the new type. This kind of defeats the purpose. We provide from_byte_array and to_byte_array to allow folk to 'cast' from one hash type to another if they really want to and its ugly on purpose.

Also, it is becoming apparent that we may be able to remove the hashes crate from the public API of primitives allowing us to stabalise primitives without stabalising hashes.

For both these reasons remove the From impl from the hash_newtype macro. Note that deprecating doesn't seem to work so we just delete it.

tcharding · 2025-02-26T04:35:33Z

Its not this simple. There is From in the other direction and also the Bytes associated const exposes hashes

pub const fn bitcoin_primitives::block::WitnessCommitment::as_byte_array(&self) -> &<bitcoin_hashes::sha256d::Hash as bitcoin_hashes::Hash>::Bytes

Kixunil · 2025-02-26T10:53:14Z

Concept ACK. The other direction is fine to keep.

However I was thinking recently that we need to have a way to avoid pub from_byte_array since for some types it's better to have assume in the name.

apoelstra · 2025-02-26T15:44:39Z

Concept ACK removing e.g. From<sha256d::Hash> from Txid. We should not be reflecting any relationship between the "raw hash" and the newtyped hash, because a Txid is not just a "sha256d hash" but rather a "sha256d hash computed by hashing particularly-structured data".

Though I would still like to keep From<[u8; 32]> for these types (except for those where we want assume in the name of from_byte_array; they should require more explicit code to construct from a potentially-untrusted source). Because a Txid is still a 32-byte blob which can in-principle be any value and in-practice is routinely encountered as a 32-byte blob.

Kixunil · 2025-02-26T21:13:02Z

Oh, this is about the hash type, not the array. OK, then the reverse must be removed too before 1.0

tcharding · 2025-03-20T03:36:54Z

Patch 2 removes the other direction.

Kixunil

ACK e721a41

Note: I believe that the second reason is much stronger than the first one, especially in the latter commit. :) Anyway, strong agreement from me!

apoelstra · 2025-03-20T19:22:13Z

bitcoin/src/blockdata/constants.rs

@@ -124,7 +124,7 @@ fn bitcoin_genesis_tx(params: &Params) -> Transaction {
 pub fn genesis_block(params: impl AsRef<Params>) -> Block<Checked> {
    let params = params.as_ref();
    let transactions = vec![bitcoin_genesis_tx(params)];
-    let hash: sha256d::Hash = transactions[0].compute_txid().into();
+    let hash = sha256d::Hash::from_byte_array(transactions[0].compute_txid().to_byte_array());


In e721a41:

Lol, we take a Txid, convert it to a sha256d::Hash and then convert it to a TxMerkleNode. The intermediate step really isn't needed at all, and kinda obfuscates what's happening (which is that in the single-transaction case we "cast" a txid directly into a Merkle root....this looks suspicious because it is suspicious! But that's Bitcoin for you.)

This is a simple enough change and a fairly small PR so I'd like you to make this change before I ack.

To be clear: the change I'm requesting is to compress the 2 conversion lines into one.

Oh, yes, good find! I agree it'd be better to do it in this PR. But maybe make this change in a separate preparatory commit with explanation since it was weird to begin with.

Nice, quick touch up on my merkle root calculation knowledge and added a patch upfront that is 4 lines of red one line of green - WIN.

Remove manual implementation of merkle root calculation and just use the function we already have. Refactor only, no logic change.

The `hash_newtype` macro is explicitly designed to produce a hash that is not a general purpose hash type to try and prevent users hashing arbitrary stuff with it. E.g., `Txid` isn't meant to be just hash arbitrary data. However we provide a `From` impl that will convert any instance of the inner hash type into the new type. This kind of defeats the purpose. We provide `from_byte_array` and `to_byte_array` to allow folk to 'cast' from one hash type to another if they really want to and its ugly on purpose. Also, it is becoming apparent that we may be able to remove the `hashes` crate from the public API of `primitives` allowing us to stabalise `primitives` without stabalising `hashes`. For both these reasons remove the `From` impl from the `hash_newtype` macro. Note that deprecating doesn't seem to work so we just delete it.

We provide the from/to_byte_array functions for casting between arrays. We shouldn't be supporting calls to `into` to quickly do the cast. We already removed the other direction, now remove this one.

Kixunil

ACK db9ec3b

Kixunil · 2025-03-21T06:37:40Z

bitcoin/src/blockdata/constants.rs

@@ -124,8 +122,7 @@ fn bitcoin_genesis_tx(params: &Params) -> Transaction {
 pub fn genesis_block(params: impl AsRef<Params>) -> Block<Checked> {
    let params = params.as_ref();
    let transactions = vec![bitcoin_genesis_tx(params)];
-    let hash: sha256d::Hash = transactions[0].compute_txid().into();
-    let merkle_root: crate::TxMerkleNode = hash.into();
+    let merkle_root = block::compute_merkle_root(&transactions).expect("transactions is not empty");


Oh, yes, IIRC the function has a length check upfront, so this has no perf impact and is semantically more sensible.

BTW, I've noticed a lot of projects require non-empty slice/vec/hashmap/hashset/... in the APIs. I'm becoming more and more annoyed that there's no good NonEmpty* type (crate) and I'm seriously thinking of writing one. @apoelstra how do you feel about having such thing in, possibly public, dependency of bitcoin? I obviously intend to make it stable, maybe even very soon. (I don't think there will be too much design space.)

I'd like to see it. I'm skeptical. I think that Rust's type system is too weak to do such a thing ergonomically.

Are you imagining having functions take e.g. a NonEmpty<HashSet<T>> and then users need to fallibly convert their normal HashSet<T>s into this and then deal with the error case even if they know it won't ever happen? That's too much ergonomic cost.

The ergonomic cost has to show up somewhere anyway - by having expect like in this case, by surprise runtime panic, or even undefined behavior... As someone said "complexity cannot be destroyed, it can only be moved to a more appropriate place".

There are various cases where it's possible to return NonEmpty<Whatever> knowing that it's definitely correct which may actually make the API more ergonomic.

One example is where you have a single item (as the initial value, before adding more or simply you only happen to need one - like here). You can have pub fn with_first(item: T) -> Self. It's also possible to implement conversions on arrays - either by simply choosing a reasonable range (up to 64 elements or whatever) or by implementing it for all arrays but post-mono compile erroring. And by accepting Into<NonEmpty<T>> and such you can allow arrays being passed in directly with no runtime checks (explicit or implicit). On the other side, you can have first, split_first, last, split_last... methods return non-Option which then simplifies the internal code.

Also while iterators are not compatible, you can still have .map() method that's equivalent to .into_iter().map().collect(). You can also have push and even pop (that simply refuses to pop the single element). And if you let the types bubble up you often find the end at deserialization - where it reminds you of handling it properly. (TBF I have some crazy code that has to deal with ~4 errors during deserialization of several different objects and "empty" is one of them but I can't just ignore it - I need to reject it upfront.)

It might not be appropriate in non-security-critical applications but bitcoin is quite security-critical. And for some odd reason I'm drawn into writing security-critical stuff.

Also the latest issue with invalid expect has shown me that we've become complacent. We no longer question code that panics because we write it so much it's getting normal to see it. I want to do something about it. Part of it is "does every panicking thing have a test?" but part of it is removing panics at type level.

This comment says it's just a refactor, although a surface level inspection indicates this can now panic whereas before I don't see that as true.

@yancyribbens the called function only returns None if the input is empty and it's not.

Fair point. Could have just said that in the expect message :P

Oh, but your question demonstrates a new point - is what I said above actually correct? The only way to know for sure is to read the code of compute_merkle_tree. If there is another reason to return None that expect would become invalid. And the function could accidentally do so. By moving the requirement into arguments and not returning Option we move this into very clear NonEmpty::ftry_from (or, in this case, no fallibility anywhere since you can just pass an array).

Without getting too into the weeds of what the compiler might be able to do so that a useless expect() isn't needed, it would be nice maybe for readability to agree on something like expect() but with a message for the developer why this should never be needed. Expect messages I guess are more for bubbling message when the code is run then it is for the person reading the code. Of course there's always the risk that code refactor makes the expect actually panic someday.

The expect message is better than nothing but even if you put the literal Coq proof there, there's no mechanism to enforce it's not out-of-sync. Imagine this code:

// Computes the arithmetic average if the numbers, returns `None` if empty. fn compute_average(items: &[usize]) -> Option<usize> { items.iter().copied().sum::<usize>().checked_div(items.len()) }

The someone somewhere calls it like this:

compute_average(&[a, b, c, d]).expect("the function returns None if the input is empty and the input is certainly not empty - as you can see we're passing in an array of four items")

Looks great, doesn't it?

Except one day someone realizes: "oh, the sum could overflow, we need checked sum" and rewrites compute_average to also return None in that case. And now the consumer code has a DoS vector because while the function call will not panic because of empty input it will panic if the sum is too large.

One very annoying way to address that is to return Result where the error is an enum and is either exhaustive - in which case the caller has to match all variants (could be just one variant) and the callee will have to break the API alerting the caller of the problem or the enum is non-exhaustive in which case the caller will theoretically know that calling expect on it is wrong. (Well, I think a clippy lint warning against this would be nice.)

The expect message is better than nothing but even if you put the literal Coq proof there

Hah. I remember studying formal language verification a bit at the university, although I never did study Coq specifically. It's a very interesting area of research though. I think one of the things that stuck with me is Tony Hoare's billion dollar mistake with the invention of the null pointer which increased my interest in Rust, not C.

Except one day someone realizes: "oh, the sum could overflow, we need checked sum" and rewrites compute_average to also return None in that case..

I agree. I had also mentioned above that the use of expect() here is fine now but not robust against refactors. I think it would have been nice to just write in the commit message though why expect is safe but just nit picking.

write in the commit message

It's better in code I think but we really want better tools to handle this.

apoelstra

ACK db9ec3b; successfully ran local tests

yancyribbens · 2025-03-21T14:38:23Z

bitcoin/src/p2p/message.rs

@@ -737,16 +738,20 @@ mod test {
                45,
                Address::new(&([123, 255, 000, 100], 833).into(), ServiceFlags::NETWORK),
            )]),
-            NetworkMessage::Inv(vec![Inventory::Block(hash([8u8; 32]).into())]),
-            NetworkMessage::GetData(vec![Inventory::Transaction(hash([45u8; 32]).into())]),
+            NetworkMessage::Inv(vec![Inventory::Block(BlockHash::from_byte_array(hash([8u8; 32]).to_byte_array()))]),


This comment looks like another refactor commit. Would be helpful to call it out as such.

doh. commit.

tcharding · 2025-03-21T20:24:23Z

One more thing I noticed while writing the expect was that it makes the codebase less maintainable because if we later patch block::compute_merkle_root to return None for some other reason then this unreachable panic may become reachable and I don't see how anyone would be expected to notice that.

apoelstra · 2025-03-21T20:33:11Z

There is no conceivable change to compute_merkle_root that would cause it to refuse to compute the merkle root of the genesis block.

tcharding · 2025-03-21T20:51:51Z

I thought as much. My point was just that I've written expect a bunch lately and everyone of those places creates this same thing. The assertion that this won't panic (the comment string) is far away from the logic (the function code) and this makes the code less maintainable because its easier to break the assertion accidentally. I don't think its a problem we can fix, its just an interesting engineering observation - well I thought it was interesting.

apoelstra · 2025-03-21T20:59:59Z

No, I agree that it's interesting, and that it's a danger in our codebase.

Rust makes this unreasonably common by its lack of specialization, dependent types, range types, nonempty containers, any consistency requirements on its standard traits, etc. Not much we can do about it.

Kixunil · 2025-03-22T09:45:25Z

@tcharding yes, I realized the exact same thing yesterday!

I don't think its a problem we can fix, its just an interesting engineering observation - well I thought it was interesting.

We have various options:

Move the checks into stronger-typed arguments as we've discussed above
Return custom result types that have methods like unwrap_arg_was_nonempty()
Make sure to run kani or other model checkers/provers on offending pieces of code
Move repeating patterns into designated functions so at least they no longer repeat (like what I did with try_into().expect())

apoelstra · 2025-03-22T13:28:06Z

BTW, if there's a way to use stronger types without polluting the public API I'm 100% for it. Or maybe we even expose both an API where you've gotta use expect and an API where you've gotta use some sort of refinement-type crate. (BTW, I'm sure that such a thing exists, and one that's more general than just nonempty containers...may be worth asking on Discourse. Though it may be that all of them do crazy stuff with Peano arithmetic and they'd blow up our compile times and/or compile errors more than they're worth.)

Maybe we should try doing both APIs, but initially have the strong-type one be only private (and supporting types exposed from internals so it's at least possible to play with them elsewhere), so that we can get a feel for it and iterate a bit.

Kixunil · 2025-03-22T13:37:30Z

without polluting the public API

Stuff like AsRef<NonEmpty<[T]>> can get us pretty far.

No problem with experimenting in private module at first.

github-actions bot added C-bitcoin PRs modifying the bitcoin crate C-hashes PRs modifying the hashes crate labels Feb 26, 2025

tcharding marked this pull request as draft February 26, 2025 04:34

tcharding force-pushed the 02-26-rm-From branch from b2b0a57 to e721a41 Compare March 20, 2025 03:36

tcharding marked this pull request as ready for review March 20, 2025 03:36

github-actions bot added the C-primitives label Mar 20, 2025

Kixunil previously approved these changes Mar 20, 2025

View reviewed changes

apoelstra reviewed Mar 20, 2025

View reviewed changes

Use compute_merkle_root

200ff47

Remove manual implementation of merkle root calculation and just use the function we already have. Refactor only, no logic change.

tcharding dismissed Kixunil’s stale review via c08c8f4 March 20, 2025 22:09

tcharding force-pushed the 02-26-rm-From branch from e721a41 to c08c8f4 Compare March 20, 2025 22:09

tcharding added 2 commits March 21, 2025 09:12

Remove From<newtype> for $hash

db9ec3b

We provide the from/to_byte_array functions for casting between arrays. We shouldn't be supporting calls to `into` to quickly do the cast. We already removed the other direction, now remove this one.

tcharding force-pushed the 02-26-rm-From branch from c08c8f4 to db9ec3b Compare March 20, 2025 22:12

Kixunil approved these changes Mar 21, 2025

View reviewed changes

apoelstra approved these changes Mar 21, 2025

View reviewed changes

yancyribbens reviewed Mar 21, 2025

View reviewed changes

apoelstra merged commit 294a58c into rust-bitcoin:master Mar 21, 2025
24 checks passed

tcharding deleted the 02-26-rm-From branch April 14, 2025 04:28

Remove From<hash> for not-general-hash types #4128

Remove From<hash> for not-general-hash types #4128

Uh oh!

Conversation

tcharding commented Feb 26, 2025

Uh oh!

tcharding commented Feb 26, 2025

Uh oh!

Kixunil commented Feb 26, 2025

Uh oh!

apoelstra commented Feb 26, 2025

Uh oh!

Kixunil commented Feb 26, 2025

Uh oh!

tcharding commented Mar 20, 2025

Uh oh!

Kixunil left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kixunil left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yancyribbens Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apoelstra left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tcharding commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apoelstra commented Mar 21, 2025

Uh oh!

tcharding commented Mar 21, 2025

Uh oh!

apoelstra commented Mar 21, 2025

Uh oh!

Kixunil commented Mar 22, 2025

Uh oh!

apoelstra commented Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kixunil commented Mar 22, 2025

Uh oh!

Uh oh!

Remove `From<hash>` for not-general-hash types #4128

Remove `From<hash>` for not-general-hash types #4128

yancyribbens Mar 23, 2025 •

edited

Loading

tcharding commented Mar 21, 2025 •

edited

Loading

apoelstra commented Mar 22, 2025 •

edited

Loading