Skip to content

Conversation

theuni
Copy link
Member

@theuni theuni commented Sep 25, 2014

As an ongoing effort to break out script verification. I hope this doesn't conflict with @sipa's or @jtimon's next steps.

This work moves the logic for writing custom core classes to scripts out of CScript. Those classes can describe how their data should be written to a script, and CScript does so without further knowledge.

Please ignore the entire PR's diffstat while reviewing as it won't be at all helpful. Individual commits will make much more sense.

Also, CScriptBinaryData is a pretty terrible name but I couldn't come up with anything better. Suggestions welcome.

// Copyright (c) 2009-2014 The Bitcoin developers
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
#include "script/binarydata.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can you add a newline above.

@Diapolo
Copy link

Diapolo commented Sep 25, 2014

@theuni Sorry for hitting you with that bunch of nits, but they should be fixed IMHO.

if (!vch.size())
return "";
std::string rv;
static const char hexmap[16] = { '0', '1', '2', '3', '4', '5', '6', '7',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't implement hexadecimal conversion code here. We've spent a lot of time going from duplicated code like this left around all over the place to the current state.
My point of moving the string encoding/decoding to a separate utility file without dependencies (utilstrencodings) was that we can use those in places like here.

@Diapolo
Copy link

Diapolo commented Sep 25, 2014

@theuni After the last few of my style nits are fixed you've got my style ACK. No, really thanks for updating this pull and taking care of the style stuff :)!

@theuni
Copy link
Member Author

theuni commented Sep 25, 2014

@Diapolo I'd put a rollseyes emoticon in the commit message if I could ;)
Keeping things uniform is nice though, so thanks for reviewing.

@laanwj What do you suggest? It'd defeat the purpose to depend on Core for the parsing. This seems like a reasonable place to make an exception, but I understand the case against as well.

utilstrencodings.h itself isn't a terrible dependency, as it only depends on tinyformat. I suppose we could call that part of our "shipped" script lib, but it'd be a shame for such a small thing.

The alternative would be using HexStr(script.begin(),script.end()) everywhere in Core that currently uses script.ToString(), but that would be really ugly.

@laanwj
Copy link
Member

laanwj commented Sep 26, 2014

@theuni Ok if a hexstr is really, truly, the only thing you need in the script lib, I'm fine with making an exception. This is trivially simple code.

One other remark though. The old implementation of that function is:

if (vch.size() <= 4)
    return strprintf("%d", CScriptNum(vch).getint());
else
    return HexStr(vch);

This makes sure that normal scriptnums are printed as numbers instead of hex strings, which people have come to expect. The new one always shows a hex string.

At least move the implementation to the implementation file instead of the header (then the header doesn't have to depend on tinyformat/strencodings/... even if you use it, and the library doesn't have to export those headers).

@theuni
Copy link
Member Author

theuni commented Sep 26, 2014

@laanwj That all sounds perfectly reasonable. How about this, then: Since there's still a good bit of work left before we can break out verification entirely, I'll leave the include for CScript for now. If, at the end, it's still the only thing needed from core, we'll dupe it here as you've specified.

Sound good?

@sipa
Copy link
Member

sipa commented Sep 26, 2014

I disagree with including script.h in key; that makes no sense - script is higher level.

I see why you're doing that - the problem is CScriptID being declared in key.h in the first place. I think it should move to script/standard.h instead.

@theuni
Copy link
Member Author

theuni commented Sep 26, 2014

@sipa: I was also unhappy with that include and tried moving CScriptID to a few places, but never found a better home. I'll give that a shot.

@theuni
Copy link
Member Author

theuni commented Sep 26, 2014

Moved CScriptID to standard.h as @sipa requested.

Re-added the string processing deps as described above. When the library can stand on its own, we can evaluate whether it's worth it to dupe the core functions there.


#include <boost/variant.hpp>
Allows arbitrary structures to be written to a script without CScript
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary to be virtual and overridable? Just having a templated method/constructor that takes any type, and copied the vector from x.begin() to x.end() seems like it would suffice. This is really just convenience - the user of CScript can always use his own code to first convert to a vector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it was done that way so that a user could make a distinction between input types and define his own serializers. If you'd rather avoid the overhead, I'm fine with a templated ctor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand, I just don't think it's worth it for the cases where we need it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @sipa. Let's avoid virtuals here. There is no need for external parties to add anything here.

if (size)
{
assert(size <= UCHAR_MAX);
m_vch.resize(size+1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a readability point I'd greatly prefer:

m_vch.reserve(size + 1);
m_vch.push_back(size);
m_vch.insert(m_vch.end(), in.begin(), in.end());

(edit: oeps)

@theuni
Copy link
Member Author

theuni commented Oct 14, 2014

Nit addressed. Will squash when ready.

size_t size = in.size();
if (size)
{
assert(size <= UCHAR_MAX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is not strict enough; the size needs to be at most 75 for this to work.

An alternative is not serializing the size in here, and doing that inside CScript::operator<<, which selects based on the size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And when combined with BIP62, the data pushed should be at least 2 bytes too.

@sipa
Copy link
Member

sipa commented Oct 15, 2014

utACK. I'm fine with squashing now.

@theuni
Copy link
Member Author

theuni commented Oct 15, 2014

squashed and rebased.

@theuni
Copy link
Member Author

theuni commented Oct 15, 2014

made the const ref change as requested by @sipa and squashed it down.

{
insert(end(), sizeof(b));
insert(end(), (unsigned char*)&b, (unsigned char*)&b + sizeof(b));
assert(data.size() >= 2 && data.size() <= MAX_SCRIPT_ELEMENT_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the only point of using a CScriptBinaryData over just using vectors is to assert that they are at least 2 bytes in size?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and that they're prefixed with the size?

Edit: right? I don't see the size being added here, but it used to be the case in the last implementation of CScriptBinaryData

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no size added here. The next line just calls this << std::vector, which will prefix the size properly...I dont see why we need CScriptBinaryData?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the size is added somewhere else.

AFAIK CScriptBinaryData is necessary to convert the argument (which is usually a uint160 or derived type) into a std::vector. It could be a function instead of a class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this has been through many iterations now. Originally it was polymorphic so that eventual users could add their own serialization functionality. Then that was dropped and serialization was done on a per-type basis. Then that was dropped, and it was templated so that it's all done the same way.

So I suppose this should go one of two ways, rather than the hybrid that it is now

  • keep the class, which allows for implicit conversion from anything with begin/end iterators. That would mean that we could drop nearly all of b25d5f0dd531aaf5618f077c646afa556f6277ad, which should've been done when it was templatized.
  • Make it a function, and use as in b25d5f0dd531aaf5618f077c646afa556f6277ad.

I really don't care either way. Opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See-also comment thread on #5091 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So #5091 is a change I've wanted to do for a while - the part about replacing the pushers for uint160/CPubKey isn't important there. As this PR already has more work towards have some wrapping, it's probably easier to use something like the suggested ToByteVector here; I can rebase #5091.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still really don't mind either way. I see the merit in being explicit, and I see the merit in letting CScript handle whatever comes its way.

Sounds like everyone's ok with making the change to ToByteVector here, then discussing #5091 as a follow-up. It's fine by me if ToByteVector is essentially reverted there, as long as progress is being made.

I'll push that change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not planning to revert ToByteVector in #5091 - I was just trying to avoid touching too much code there, but as it's exactly code you're already touching, that's fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still really don't mind either way. I see the merit in being explicit, and I see the merit in letting CScript handle whatever comes its way.

I see merit in it, and I wouldn't care either way if this was, for example, GUI code. But for consensus-critical code it's too dangerous to just "handle everything that comes its way in some way". I like to be explicit and clear in what gets handled how and what the representation is. And to have compile-time errors when something is fishy. Remember the fun a while ago with std::vector<unsigned char>(a.begin,e.end()) with a being a std::vector<int>() "just working" by truncating each value?

@sipa
Copy link
Member

sipa commented Oct 15, 2014

We could of course just have a CScript::operator<<(T begin, T end) { std::vector vch(begin, end); *this << vch; } ?

@theuni
Copy link
Member Author

theuni commented Oct 16, 2014

Updated to use ToByteVector, squashed, and rebased.

ToByteVector() is currently in script.h, which is not a good place for it, but I don't see a better one that wouldn't drag in the deps we're working to remove here. I'm hoping we can at least agree to leave it there long enough to get this merged in, then move it around later if desired.

Also, I didn't add any extra checks or conditions to the operator<<(vector), since #5091 reworks that.

@sipa
Copy link
Member

sipa commented Oct 16, 2014

utACK

@sipa
Copy link
Member

sipa commented Oct 16, 2014

Agree with @jtimon; the first group of 3 commits and the second 3 commits make much more sense when squashed together, for example.

@TheBlueMatt
Copy link
Contributor

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

utACK commithash c558f43500d5bd6fcad74a7254dd509b928c9183
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJUQFraAAoJEIm7uGY+LmXO2iEP/R/Pc0345ZTn6mdQb2qGZP3q
ZVs4hsK9xIexPY44GfpEV+oF+CHSeb7zqEZKb9V3U6Wv4Q/AjvjOkfCWxIvJLZ++
bXQm8acqywW1Rp4UmMap+UfiOJGyWnhZIZmdMTBKbbMPpyNZlop1IVJgw+vn78jW
lyNQM5Sf5PW2K1CntTRNHX+lB9oLgG1OnVsmhexvsgnrVYsJfOB3d9SgH29KW2xr
vfRMtiasFQnLsYnBdTvfml/pFL9jcFL4cG0UjaThcyfXIb5Cx1pkmZHRlpB6fjFg
6H7BbMK2YnIO0Gtbyfg4n7tcQemB9BDXtKcqxmKel/7UepF27f/DOR27iCc6QL5G
+5zO9yfBaNq+so+lw/Y40lvEZ6ZzCJOVBkTGhJzYBn7RU2D/jR5/C5b8ma71UR5c
C61ahnMeBqukZ/sKkXlWtsb8WUaWPsalgxkpQ2+d8tVoeFe5FYEzqtuVAuN8FrRa
4CjrFF1euVAQV3KkyPXqkD4c/1QkGGrwn7tuG+6oT1zzQeXehip2ZzK0EWb/KXKK
G2nYLtEGwqZ5N/Zi3vZxtzg5hwXU7KlSouM4mBhNH7oonZzTKC3+EzG6JFt2Uhkx
bwcjpK72aVj9rl5Zt9obmy9O22t2f7bW3HtgvEoNK6l979o/58IfJTC9njr6c+5o
KEciQREeTXDRaMF5F16X
=b86A
-----END PGP SIGNATURE-----

@theuni
Copy link
Member Author

theuni commented Oct 17, 2014

Ok. I find it easier to review/digest that way, but I'll keep em more grouped together.
Will squash.

… from CScripts

This allows for a reversal of the current behavior.

This:
CScript foo;
CScriptID bar(foo.GetID());

Becomes:
CScript foo;
CScriptID bar(foo);

This way, CScript is no longer dependent on CScriptID or Hash();
This should move to a util header once their dependencies are cleaned up.
…pt.h

Lots of files ended up with indirect includes from script.h.
@theuni
Copy link
Member Author

theuni commented Oct 17, 2014

Squished down deeper. The header removal commit remains on top though, that was the purpose of this series. Not re-rebased against master, so this is identical code-wise to previously ACKed changes.

@gavinandresen
Copy link
Contributor

utACK

@laanwj laanwj merged commit 85c579e into bitcoin:master Oct 22, 2014
laanwj added a commit that referenced this pull request Oct 22, 2014
85c579e script: add a slew of includes all around and drop includes from script.h (Cory Fields)
db8eb54 script: move ToString and ValueString out of the header (Cory Fields)
e9ca428 script: add ToByteVector() for converting anything with begin/end (Cory Fields)
066e2a1 script: move CScriptID to standard.h and add a ctor for creating them from CScripts (Cory Fields)
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants