-
Notifications
You must be signed in to change notification settings - Fork 877
Remove From<hash>
for not-general-hash types
#4128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Its not this simple. There is pub const fn bitcoin_primitives::block::WitnessCommitment::as_byte_array(&self) -> &<bitcoin_hashes::sha256d::Hash as bitcoin_hashes::Hash>::Bytes |
Concept ACK. The other direction is fine to keep. However I was thinking recently that we need to have a way to avoid |
Concept ACK removing e.g. Though I would still like to keep |
Oh, this is about the hash type, not the array. OK, then the reverse must be removed too before 1.0 |
b2b0a57
to
e721a41
Compare
Patch 2 removes the other direction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK e721a41
Note: I believe that the second reason is much stronger than the first one, especially in the latter commit. :) Anyway, strong agreement from me!
bitcoin/src/blockdata/constants.rs
Outdated
@@ -124,7 +124,7 @@ fn bitcoin_genesis_tx(params: &Params) -> Transaction { | |||
pub fn genesis_block(params: impl AsRef<Params>) -> Block<Checked> { | |||
let params = params.as_ref(); | |||
let transactions = vec![bitcoin_genesis_tx(params)]; | |||
let hash: sha256d::Hash = transactions[0].compute_txid().into(); | |||
let hash = sha256d::Hash::from_byte_array(transactions[0].compute_txid().to_byte_array()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In e721a41:
Lol, we take a Txid
, convert it to a sha256d::Hash
and then convert it to a TxMerkleNode
. The intermediate step really isn't needed at all, and kinda obfuscates what's happening (which is that in the single-transaction case we "cast" a txid directly into a Merkle root....this looks suspicious because it is suspicious! But that's Bitcoin for you.)
This is a simple enough change and a fairly small PR so I'd like you to make this change before I ack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: the change I'm requesting is to compress the 2 conversion lines into one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, good find! I agree it'd be better to do it in this PR. But maybe make this change in a separate preparatory commit with explanation since it was weird to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, quick touch up on my merkle root calculation knowledge and added a patch upfront that is 4 lines of red one line of green - WIN.
Remove manual implementation of merkle root calculation and just use the function we already have. Refactor only, no logic change.
e721a41
to
c08c8f4
Compare
The `hash_newtype` macro is explicitly designed to produce a hash that is not a general purpose hash type to try and prevent users hashing arbitrary stuff with it. E.g., `Txid` isn't meant to be just hash arbitrary data. However we provide a `From` impl that will convert any instance of the inner hash type into the new type. This kind of defeats the purpose. We provide `from_byte_array` and `to_byte_array` to allow folk to 'cast' from one hash type to another if they really want to and its ugly on purpose. Also, it is becoming apparent that we may be able to remove the `hashes` crate from the public API of `primitives` allowing us to stabalise `primitives` without stabalising `hashes`. For both these reasons remove the `From` impl from the `hash_newtype` macro. Note that deprecating doesn't seem to work so we just delete it.
We provide the from/to_byte_array functions for casting between arrays. We shouldn't be supporting calls to `into` to quickly do the cast. We already removed the other direction, now remove this one.
c08c8f4
to
db9ec3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK db9ec3b
@@ -124,8 +122,7 @@ fn bitcoin_genesis_tx(params: &Params) -> Transaction { | |||
pub fn genesis_block(params: impl AsRef<Params>) -> Block<Checked> { | |||
let params = params.as_ref(); | |||
let transactions = vec![bitcoin_genesis_tx(params)]; | |||
let hash: sha256d::Hash = transactions[0].compute_txid().into(); | |||
let merkle_root: crate::TxMerkleNode = hash.into(); | |||
let merkle_root = block::compute_merkle_root(&transactions).expect("transactions is not empty"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, IIRC the function has a length check upfront, so this has no perf impact and is semantically more sensible.
BTW, I've noticed a lot of projects require non-empty slice/vec/hashmap/hashset/... in the APIs. I'm becoming more and more annoyed that there's no good NonEmpty*
type (crate) and I'm seriously thinking of writing one. @apoelstra how do you feel about having such thing in, possibly public, dependency of bitcoin
? I obviously intend to make it stable, maybe even very soon. (I don't think there will be too much design space.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see it. I'm skeptical. I think that Rust's type system is too weak to do such a thing ergonomically.
Are you imagining having functions take e.g. a NonEmpty<HashSet<T>>
and then users need to fallibly convert their normal HashSet<T>
s into this and then deal with the error case even if they know it won't ever happen? That's too much ergonomic cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ergonomic cost has to show up somewhere anyway - by having expect
like in this case, by surprise runtime panic, or even undefined behavior... As someone said "complexity cannot be destroyed, it can only be moved to a more appropriate place".
There are various cases where it's possible to return NonEmpty<Whatever>
knowing that it's definitely correct which may actually make the API more ergonomic.
One example is where you have a single item (as the initial value, before adding more or simply you only happen to need one - like here). You can have pub fn with_first(item: T) -> Self
. It's also possible to implement conversions on arrays - either by simply choosing a reasonable range (up to 64 elements or whatever) or by implementing it for all arrays but post-mono compile erroring. And by accepting Into<NonEmpty<T>>
and such you can allow arrays being passed in directly with no runtime checks (explicit or implicit). On the other side, you can have first
, split_first
, last
, split_last
... methods return non-Option
which then simplifies the internal code.
Also while iterators are not compatible, you can still have .map()
method that's equivalent to .into_iter().map().collect()
. You can also have push
and even pop
(that simply refuses to pop the single element). And if you let the types bubble up you often find the end at deserialization - where it reminds you of handling it properly. (TBF I have some crazy code that has to deal with ~4 errors during deserialization of several different objects and "empty" is one of them but I can't just ignore it - I need to reject it upfront.)
It might not be appropriate in non-security-critical applications but bitcoin
is quite security-critical. And for some odd reason I'm drawn into writing security-critical stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the latest issue with invalid expect
has shown me that we've become complacent. We no longer question code that panics because we write it so much it's getting normal to see it. I want to do something about it. Part of it is "does every panicking thing have a test?" but part of it is removing panics at type level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment says it's just a refactor, although a surface level inspection indicates this can now panic whereas before I don't see that as true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yancyribbens the called function only returns None if the input is empty and it's not.
Fair point. Could have just said that in the expect message :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but your question demonstrates a new point - is what I said above actually correct? The only way to know for sure is to read the code of compute_merkle_tree. If there is another reason to return None that expect would become invalid. And the function could accidentally do so. By moving the requirement into arguments and not returning Option we move this into very clear NonEmpty::ftry_from (or, in this case, no fallibility anywhere since you can just pass an array).
Without getting too into the weeds of what the compiler might be able to do so that a useless expect()
isn't needed, it would be nice maybe for readability to agree on something like expect()
but with a message for the developer why this should never be needed. Expect messages I guess are more for bubbling message when the code is run then it is for the person reading the code. Of course there's always the risk that code refactor makes the expect actually panic someday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expect message is better than nothing but even if you put the literal Coq proof there, there's no mechanism to enforce it's not out-of-sync. Imagine this code:
// Computes the arithmetic average if the numbers, returns `None` if empty.
fn compute_average(items: &[usize]) -> Option<usize> {
items.iter().copied().sum::<usize>().checked_div(items.len())
}
The someone somewhere calls it like this:
compute_average(&[a, b, c, d]).expect("the function returns None if the input is empty and the input is certainly not empty - as you can see we're passing in an array of four items")
Looks great, doesn't it?
Except one day someone realizes: "oh, the sum could overflow, we need checked sum" and rewrites compute_average
to also return None
in that case. And now the consumer code has a DoS vector because while the function call will not panic because of empty input it will panic if the sum is too large.
One very annoying way to address that is to return Result
where the error is an enum and is either exhaustive - in which case the caller has to match all variants (could be just one variant) and the callee will have to break the API alerting the caller of the problem or the enum is non-exhaustive in which case the caller will theoretically know that calling expect on it is wrong. (Well, I think a clippy lint warning against this would be nice.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expect message is better than nothing but even if you put the literal Coq proof there
Hah. I remember studying formal language verification a bit at the university, although I never did study Coq specifically. It's a very interesting area of research though. I think one of the things that stuck with me is Tony Hoare's billion dollar mistake with the invention of the null pointer which increased my interest in Rust, not C.
Except one day someone realizes: "oh, the sum could overflow, we need checked sum" and rewrites compute_average to also return None in that case..
I agree. I had also mentioned above that the use of expect()
here is fine now but not robust against refactors. I think it would have been nice to just write in the commit message though why expect
is safe but just nit picking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write in the commit message
It's better in code I think but we really want better tools to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK db9ec3b; successfully ran local tests
@@ -737,16 +738,20 @@ mod test { | |||
45, | |||
Address::new(&([123, 255, 000, 100], 833).into(), ServiceFlags::NETWORK), | |||
)]), | |||
NetworkMessage::Inv(vec![Inventory::Block(hash([8u8; 32]).into())]), | |||
NetworkMessage::GetData(vec![Inventory::Transaction(hash([45u8; 32]).into())]), | |||
NetworkMessage::Inv(vec![Inventory::Block(BlockHash::from_byte_array(hash([8u8; 32]).to_byte_array()))]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment looks like another refactor commit. Would be helpful to call it out as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doh. commit.
One more thing I noticed while writing the |
There is no conceivable change to |
I thought as much. My point was just that I've written |
No, I agree that it's interesting, and that it's a danger in our codebase. Rust makes this unreasonably common by its lack of specialization, dependent types, range types, nonempty containers, any consistency requirements on its standard traits, etc. Not much we can do about it. |
@tcharding yes, I realized the exact same thing yesterday!
We have various options:
|
BTW, if there's a way to use stronger types without polluting the public API I'm 100% for it. Or maybe we even expose both an API where you've gotta use Maybe we should try doing both APIs, but initially have the strong-type one be only private (and supporting types exposed from |
Stuff like No problem with experimenting in private module at first. |
The
hash_newtype
macro is explicitly designed to produce a hash that is not a general purpose hash type to try and prevent users hashing arbitrary stuff with it. E.g.,Txid
isn't meant to be just hash arbitrary data. However we provide aFrom
impl that will convert any instance of the inner hash type into the new type. This kind of defeats the purpose. We providefrom_byte_array
andto_byte_array
to allow folk to 'cast' from one hash type to another if they really want to and its ugly on purpose.Also, it is becoming apparent that we may be able to remove the
hashes
crate from the public API ofprimitives
allowing us to stabaliseprimitives
without stabalisinghashes
.For both these reasons remove the
From
impl from thehash_newtype
macro. Note that deprecating doesn't seem to work so we just delete it.