Skip to content

Conversation

tcharding
Copy link
Member

The hash_newtype macro is explicitly designed to produce a hash that is not a general purpose hash type to try and prevent users hashing arbitrary stuff with it. E.g., Txid isn't meant to be just hash arbitrary data. However we provide a From impl that will convert any instance of the inner hash type into the new type. This kind of defeats the purpose. We provide from_byte_array and to_byte_array to allow folk to 'cast' from one hash type to another if they really want to and its ugly on purpose.

Also, it is becoming apparent that we may be able to remove the hashes crate from the public API of primitives allowing us to stabalise primitives without stabalising hashes.

For both these reasons remove the From impl from the hash_newtype macro. Note that deprecating doesn't seem to work so we just delete it.

@github-actions github-actions bot added C-bitcoin PRs modifying the bitcoin crate C-hashes PRs modifying the hashes crate labels Feb 26, 2025
@tcharding tcharding marked this pull request as draft February 26, 2025 04:34
@tcharding
Copy link
Member Author

Its not this simple. There is From in the other direction and also the Bytes associated const exposes hashes

pub const fn bitcoin_primitives::block::WitnessCommitment::as_byte_array(&self) -> &<bitcoin_hashes::sha256d::Hash as bitcoin_hashes::Hash>::Bytes

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 26, 2025

Concept ACK. The other direction is fine to keep.

However I was thinking recently that we need to have a way to avoid pub from_byte_array since for some types it's better to have assume in the name.

@apoelstra
Copy link
Member

Concept ACK removing e.g. From<sha256d::Hash> from Txid. We should not be reflecting any relationship between the "raw hash" and the newtyped hash, because a Txid is not just a "sha256d hash" but rather a "sha256d hash computed by hashing particularly-structured data".

Though I would still like to keep From<[u8; 32]> for these types (except for those where we want assume in the name of from_byte_array; they should require more explicit code to construct from a potentially-untrusted source). Because a Txid is still a 32-byte blob which can in-principle be any value and in-practice is routinely encountered as a 32-byte blob.

@Kixunil
Copy link
Collaborator

Kixunil commented Feb 26, 2025

Oh, this is about the hash type, not the array. OK, then the reverse must be removed too before 1.0

@tcharding tcharding marked this pull request as ready for review March 20, 2025 03:36
@tcharding
Copy link
Member Author

Patch 2 removes the other direction.

Kixunil
Kixunil previously approved these changes Mar 20, 2025
Copy link
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK e721a41

Note: I believe that the second reason is much stronger than the first one, especially in the latter commit. :) Anyway, strong agreement from me!

@@ -124,7 +124,7 @@ fn bitcoin_genesis_tx(params: &Params) -> Transaction {
pub fn genesis_block(params: impl AsRef<Params>) -> Block<Checked> {
let params = params.as_ref();
let transactions = vec![bitcoin_genesis_tx(params)];
let hash: sha256d::Hash = transactions[0].compute_txid().into();
let hash = sha256d::Hash::from_byte_array(transactions[0].compute_txid().to_byte_array());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In e721a41:

Lol, we take a Txid, convert it to a sha256d::Hash and then convert it to a TxMerkleNode. The intermediate step really isn't needed at all, and kinda obfuscates what's happening (which is that in the single-transaction case we "cast" a txid directly into a Merkle root....this looks suspicious because it is suspicious! But that's Bitcoin for you.)

This is a simple enough change and a fairly small PR so I'd like you to make this change before I ack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: the change I'm requesting is to compress the 2 conversion lines into one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, good find! I agree it'd be better to do it in this PR. But maybe make this change in a separate preparatory commit with explanation since it was weird to begin with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, quick touch up on my merkle root calculation knowledge and added a patch upfront that is 4 lines of red one line of green - WIN.

Remove manual implementation of merkle root calculation and just use the
function we already have.

Refactor only, no logic change.
The `hash_newtype` macro is explicitly designed to produce a hash that
is not a general purpose hash type to try and prevent users hashing
arbitrary stuff with it. E.g., `Txid` isn't meant to be just hash
arbitrary data. However we provide a `From` impl that will convert any
instance of the inner hash type into the new type. This kind of defeats
the purpose. We provide `from_byte_array` and `to_byte_array` to allow
folk to 'cast' from one hash type to another if they really want to and
its ugly on purpose.

Also, it is becoming apparent that we may be able to remove the `hashes`
crate from the public API of `primitives` allowing us to stabalise
`primitives` without stabalising `hashes`.

For both these reasons remove the `From` impl from the `hash_newtype`
macro. Note that deprecating doesn't seem to work so we just delete it.
We provide the from/to_byte_array functions for casting between arrays.
We shouldn't be supporting calls to `into` to quickly do the cast.

We already removed the other direction, now remove this one.
Copy link
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK db9ec3b

@@ -124,8 +122,7 @@ fn bitcoin_genesis_tx(params: &Params) -> Transaction {
pub fn genesis_block(params: impl AsRef<Params>) -> Block<Checked> {
let params = params.as_ref();
let transactions = vec![bitcoin_genesis_tx(params)];
let hash: sha256d::Hash = transactions[0].compute_txid().into();
let merkle_root: crate::TxMerkleNode = hash.into();
let merkle_root = block::compute_merkle_root(&transactions).expect("transactions is not empty");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, IIRC the function has a length check upfront, so this has no perf impact and is semantically more sensible.

BTW, I've noticed a lot of projects require non-empty slice/vec/hashmap/hashset/... in the APIs. I'm becoming more and more annoyed that there's no good NonEmpty* type (crate) and I'm seriously thinking of writing one. @apoelstra how do you feel about having such thing in, possibly public, dependency of bitcoin? I obviously intend to make it stable, maybe even very soon. (I don't think there will be too much design space.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see it. I'm skeptical. I think that Rust's type system is too weak to do such a thing ergonomically.

Are you imagining having functions take e.g. a NonEmpty<HashSet<T>> and then users need to fallibly convert their normal HashSet<T>s into this and then deal with the error case even if they know it won't ever happen? That's too much ergonomic cost.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ergonomic cost has to show up somewhere anyway - by having expect like in this case, by surprise runtime panic, or even undefined behavior... As someone said "complexity cannot be destroyed, it can only be moved to a more appropriate place".

There are various cases where it's possible to return NonEmpty<Whatever> knowing that it's definitely correct which may actually make the API more ergonomic.

One example is where you have a single item (as the initial value, before adding more or simply you only happen to need one - like here). You can have pub fn with_first(item: T) -> Self. It's also possible to implement conversions on arrays - either by simply choosing a reasonable range (up to 64 elements or whatever) or by implementing it for all arrays but post-mono compile erroring. And by accepting Into<NonEmpty<T>> and such you can allow arrays being passed in directly with no runtime checks (explicit or implicit). On the other side, you can have first, split_first, last, split_last... methods return non-Option which then simplifies the internal code.

Also while iterators are not compatible, you can still have .map() method that's equivalent to .into_iter().map().collect(). You can also have push and even pop (that simply refuses to pop the single element). And if you let the types bubble up you often find the end at deserialization - where it reminds you of handling it properly. (TBF I have some crazy code that has to deal with ~4 errors during deserialization of several different objects and "empty" is one of them but I can't just ignore it - I need to reject it upfront.)

It might not be appropriate in non-security-critical applications but bitcoin is quite security-critical. And for some odd reason I'm drawn into writing security-critical stuff.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the latest issue with invalid expect has shown me that we've become complacent. We no longer question code that panics because we write it so much it's getting normal to see it. I want to do something about it. Part of it is "does every panicking thing have a test?" but part of it is removing panics at type level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says it's just a refactor, although a surface level inspection indicates this can now panic whereas before I don't see that as true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yancyribbens the called function only returns None if the input is empty and it's not.

Fair point. Could have just said that in the expect message :P

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but your question demonstrates a new point - is what I said above actually correct? The only way to know for sure is to read the code of compute_merkle_tree. If there is another reason to return None that expect would become invalid. And the function could accidentally do so. By moving the requirement into arguments and not returning Option we move this into very clear NonEmpty::ftry_from (or, in this case, no fallibility anywhere since you can just pass an array).

Without getting too into the weeds of what the compiler might be able to do so that a useless expect() isn't needed, it would be nice maybe for readability to agree on something like expect() but with a message for the developer why this should never be needed. Expect messages I guess are more for bubbling message when the code is run then it is for the person reading the code. Of course there's always the risk that code refactor makes the expect actually panic someday.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expect message is better than nothing but even if you put the literal Coq proof there, there's no mechanism to enforce it's not out-of-sync. Imagine this code:

// Computes the arithmetic average if the numbers, returns `None` if empty.
fn compute_average(items: &[usize]) -> Option<usize> {
    items.iter().copied().sum::<usize>().checked_div(items.len())
}

The someone somewhere calls it like this:

compute_average(&[a, b, c, d]).expect("the function returns None if the input is empty and the input is certainly not empty - as you can see we're passing in an array of four items")

Looks great, doesn't it?

Except one day someone realizes: "oh, the sum could overflow, we need checked sum" and rewrites compute_average to also return None in that case. And now the consumer code has a DoS vector because while the function call will not panic because of empty input it will panic if the sum is too large.

One very annoying way to address that is to return Result where the error is an enum and is either exhaustive - in which case the caller has to match all variants (could be just one variant) and the callee will have to break the API alerting the caller of the problem or the enum is non-exhaustive in which case the caller will theoretically know that calling expect on it is wrong. (Well, I think a clippy lint warning against this would be nice.)

Copy link
Contributor

@yancyribbens yancyribbens Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expect message is better than nothing but even if you put the literal Coq proof there

Hah. I remember studying formal language verification a bit at the university, although I never did study Coq specifically. It's a very interesting area of research though. I think one of the things that stuck with me is Tony Hoare's billion dollar mistake with the invention of the null pointer which increased my interest in Rust, not C.

Except one day someone realizes: "oh, the sum could overflow, we need checked sum" and rewrites compute_average to also return None in that case..

I agree. I had also mentioned above that the use of expect() here is fine now but not robust against refactors. I think it would have been nice to just write in the commit message though why expect is safe but just nit picking.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write in the commit message

It's better in code I think but we really want better tools to handle this.

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK db9ec3b; successfully ran local tests

@@ -737,16 +738,20 @@ mod test {
45,
Address::new(&([123, 255, 000, 100], 833).into(), ServiceFlags::NETWORK),
)]),
NetworkMessage::Inv(vec![Inventory::Block(hash([8u8; 32]).into())]),
NetworkMessage::GetData(vec![Inventory::Transaction(hash([45u8; 32]).into())]),
NetworkMessage::Inv(vec![Inventory::Block(BlockHash::from_byte_array(hash([8u8; 32]).to_byte_array()))]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment looks like another refactor commit. Would be helpful to call it out as such.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doh. commit.

@apoelstra apoelstra merged commit 294a58c into rust-bitcoin:master Mar 21, 2025
24 checks passed
@tcharding
Copy link
Member Author

tcharding commented Mar 21, 2025

One more thing I noticed while writing the expect was that it makes the codebase less maintainable because if we later patch block::compute_merkle_root to return None for some other reason then this unreachable panic may become reachable and I don't see how anyone would be expected to notice that.

@apoelstra
Copy link
Member

There is no conceivable change to compute_merkle_root that would cause it to refuse to compute the merkle root of the genesis block.

@tcharding
Copy link
Member Author

I thought as much. My point was just that I've written expect a bunch lately and everyone of those places creates this same thing. The assertion that this won't panic (the comment string) is far away from the logic (the function code) and this makes the code less maintainable because its easier to break the assertion accidentally. I don't think its a problem we can fix, its just an interesting engineering observation - well I thought it was interesting.

@apoelstra
Copy link
Member

No, I agree that it's interesting, and that it's a danger in our codebase.

Rust makes this unreasonably common by its lack of specialization, dependent types, range types, nonempty containers, any consistency requirements on its standard traits, etc. Not much we can do about it.

@Kixunil
Copy link
Collaborator

Kixunil commented Mar 22, 2025

@tcharding yes, I realized the exact same thing yesterday!

I don't think its a problem we can fix, its just an interesting engineering observation - well I thought it was interesting.

We have various options:

  • Move the checks into stronger-typed arguments as we've discussed above
  • Return custom result types that have methods like unwrap_arg_was_nonempty()
  • Make sure to run kani or other model checkers/provers on offending pieces of code
  • Move repeating patterns into designated functions so at least they no longer repeat (like what I did with try_into().expect())

@apoelstra
Copy link
Member

apoelstra commented Mar 22, 2025

BTW, if there's a way to use stronger types without polluting the public API I'm 100% for it. Or maybe we even expose both an API where you've gotta use expect and an API where you've gotta use some sort of refinement-type crate. (BTW, I'm sure that such a thing exists, and one that's more general than just nonempty containers...may be worth asking on Discourse. Though it may be that all of them do crazy stuff with Peano arithmetic and they'd blow up our compile times and/or compile errors more than they're worth.)

Maybe we should try doing both APIs, but initially have the strong-type one be only private (and supporting types exposed from internals so it's at least possible to play with them elsewhere), so that we can get a feel for it and iterate a bit.

@Kixunil
Copy link
Collaborator

Kixunil commented Mar 22, 2025

without polluting the public API

Stuff like AsRef<NonEmpty<[T]>> can get us pretty far.

No problem with experimenting in private module at first.

@tcharding tcharding deleted the 02-26-rm-From branch April 14, 2025 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bitcoin PRs modifying the bitcoin crate C-hashes PRs modifying the hashes crate C-primitives
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants