Skip to content

Conversation

ken2812221
Copy link
Contributor

@ken2812221 ken2812221 commented Apr 24, 2018

The cron job that runs every day would fail because of git checkout a single commit, not a branch.

#12708 introduce a method to check whether merges are clean.
However, there are four merges are not clean.
So, I add a list of merges that are dirty and ignore them.

Also, I modify the current shell script to python, it makes the script speed up a lot.
The python code tree_sha512sum was copied from github-merge.py

I've re-designed this. Now we verify all the things by default.

  • Add --disable-tree-check option, not to check SHA-512 tree
  • Add --clean-merge NUMBER option, only verify commits after <NUMBER> days ago

Travis running time:

option time
verify-commits.py 25m47.02s(1547.02s)
verify-commits.py --disable-tree-check 19m10.08s(1150.08s)
verify-commits.py --clean-merge 30 9m18.18s(558.18s)
verify-commits.py --disable-tree-check --clean-merge 30 1m16.51s(76.51s)

Since the cron job always fail, I've created a respository to verify this daily.
Build Status

@ken2812221 ken2812221 changed the title Make travis to run verify-commits Make travis run verify-commits Apr 24, 2018
@fanquake fanquake added the Tests label Apr 24, 2018
@maflcko
Copy link
Member

maflcko commented Apr 24, 2018

The cron job that runs every day would fail because of git checkout a single commit, not a branch.

Mind to elaborate why it wouldn't work on a "single commit"?

@ken2812221
Copy link
Contributor Author

ken2812221 commented Apr 24, 2018

git rev-parse --abbrev-ref HEAD does not work with commit, it always returns HEAD.
And git would not return to latest commit after the script.
But I don't know if this is the exact reason. I just test on my PC, the script failed to check signature at random commit.

@laanwj
Copy link
Member

laanwj commented Apr 24, 2018

Also, I modify the current shell script to python, it makes the script speed up a lot.
The python code tree_sha512sum was copied from github-merge.py

Thanks! Yes, the current implementation has been slow especially the treehash512 checking, my implementation from github-merge.py only uses a single call to ls-tree which makes it more efficient.

Concept ACK.

@maflcko
Copy link
Member

maflcko commented Apr 24, 2018

I think we should remove the shell script after this has been converted to python. Objections, @TheBlueMatt ?

PREV_COMMIT = ""
INITIAL_COMMIT = CURRENT_COMMIT

BRANCH = subprocess.check_output([GIT,'rev-parse','--abbrev-ref','HEAD'], universal_newlines=True).splitlines()[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be INITIAL_COMMIT instead of HEAD?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a direct transcription from the shell script at least:

BRANCH="$(git rev-parse --abbrev-ref HEAD)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is a string 'HEAD', it would return a branch name. If you put a commit id, it returns nothing.

Copy link
Member

@maflcko maflcko Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'HEAD', it would return a branch name. If you put a commit id, it returns nothing.

It always returns a commit id for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the git version might be different. I'm using git 2.1.4

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that's it. To me this seems like a thing to fix (The script shouldn't depends on what version of git you are using)

Also, I still think that the correct thing to use here is INITIAL_COMMIT. I.e. git log --format="%H" -1 "${INITIAL_COMMIT}". Otherwise a comment might be appropriate to say why HEAD is used here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use it just because the original shell script, so I change it to git show -s --format="%H" ${INITIAL_COMMIT} now, and looks like we don't need to change the travis script.

@sipa
Copy link
Member

sipa commented Apr 24, 2018

All the unclean merge commits are from 2016 or earlier. Perhaps instead there can just be a threshold that skips the validation below a certain age (say, 1 month)?

@ken2812221
Copy link
Contributor Author

ken2812221 commented Apr 25, 2018

All the unclean merge commits are from 2016 or earlier. Perhaps instead there can just be a threshold that skips the validation below a certain age (say, 1 month)?

Or do we need a trusted-clean-merge-root? This can be when that merge policy start.

@ken2812221
Copy link
Contributor Author

ken2812221 commented Apr 25, 2018

Test for only checking commits in one month: https://travis-ci.org/ken2812221/bitcoin/jobs/371267929

@ken2812221
Copy link
Contributor Author

ken2812221 commented Apr 25, 2018

Squashed and delete shell script.

We can run the run the script in about 80 seconds, so can we revert fa6f12a?

@jonasschnelli
Copy link
Contributor

I think the PR title should be "Migrate verify-commits script to python, run in travis" (or similar)

@ken2812221 ken2812221 changed the title Make travis run verify-commits Migrate verify-commits script to python, run in travis Apr 26, 2018
@ken2812221
Copy link
Contributor Author

@jonasschnelli Thanks

@laanwj
Copy link
Member

laanwj commented Apr 27, 2018

I did a timing comparison locally, this is an impressive win (given that the checks are correct):

contrib/verify-commits/verify-commits.py 
(all the way to dc0305d15aa02819cd4763e1efe3876d674faea7, after verifying 2627)
real    5m36.844s
user    3m52.696s
sys     1m1.720s

contrib/verify-commits/verify-commits.sh 
(and it wasn't even completely finished yet, only up to commit 6052d509105790a26b3ad5df43dd61e7f1b24a12, after verifying 1940)
real    32m1.084s
user    25m57.068s
sys     4m16.484s

@laanwj
Copy link
Member

laanwj commented Apr 27, 2018

IRC discussion:

<wumpus> ken2812221: so to be clear: the only expected difference in behavior between your python script and the shell script is that your script only verifies merges up to a certain depth?
<wumpus> (which is a sensible change, as the shell script throws an error there, not allowing it to verify the gpg signatures deeper either)
<ken2812221> Yes.

As far as I see this is the case, so utACK 058d18bdf302976b6d59752ab816463c89f70404.

@laanwj laanwj requested a review from TheBlueMatt April 27, 2018 15:29
@TheBlueMatt
Copy link
Contributor

I'm fine with migrating this to python, but I certainly dont know python well enough to verify something that is intended to be a secure way of checking our git tree, nor maintain it in the future. If this change means I dont have to and others will, however, thats fine.

@TheBlueMatt
Copy link
Contributor

NACK changing the script to only verify up to N commits instead of up to a trusted root by default. Its fine to do that on travis but that obviously defeats the point of "checking the git commits come from a trusted source and don't just have a different timestamp on them".

@ken2812221
Copy link
Contributor Author

I have two ways to do this. The first, add allow-unclean-merge-commits. The list is following.

6052d509105790a26b3ad5df43dd61e7f1b24a12
3798e5de334c3deb5f71302b782f6b8fbd5087f1
326ffed09bfcc209a2efd6a2ebc69edf6bd200b5
97d83739db0631be5d4ba86af3616014652c00ec

The second way is add trusted-unclean-merge-commit-root. It would be

6052d509105790a26b3ad5df43dd61e7f1b24a12

@TheBlueMatt Which way would you prefer?

@maflcko
Copy link
Member

maflcko commented Apr 27, 2018

Given that the speedup is just based on skipping verification, I'd slightly prefer to stick around with the bash script for now and do the python transcribe in a follow up pr.

@ken2812221
Copy link
Contributor Author

@MarcoFalke If we do the entire verification in python, it still just take 18 mins on travis.

@ken2812221
Copy link
Contributor Author

ken2812221 commented May 1, 2018

I've re-designed this. Now we verify all the things by default.

  • Add --disable-tree-check option, not to check SHA-512 tree
  • Add --clean-merge NUMBER option, only verify commits after <NUMBER> days ago

Travis running time:

option time
verify-commits.py 25m47.02s(1547.02s)
verify-commits.py --disable-tree-check 19m10.08s(1150.08s)
verify-commits.py --clean-merge 30 9m18.18s(558.18s)
verify-commits.py --disable-tree-check --clean-merge 30 1m16.51s(76.51s)

Since the cron job always fail, I've created a respository to verify this daily.
Build Status

Copy link
Contributor

@jnewbery jnewbery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 3ef8327.

This looks generally good to me. I started adding some nits but decided to just go ahead and implement the changes myself here: https://github.com/jnewbery/bitcoin/tree/pr13066.1 . Feel free to take as much of that as you want. It's mostly nits and style - adding comments, fixing flake8 warnings, etc.

I'm not a shell expert so something subtle may have escaped me, but it looks functionally the same as verify-commits.sh.

I haven't verified the incorrect sha 512 commits and unclean merge commits.

import subprocess
import sys
import time
from subprocess import PIPE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're already importing subprocess. Suggest you just use subprocess.PIPE throughout.

Same for sys.stderr below.

@@ -0,0 +1,131 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include a copyright notice and module docstring, eg:

# Copyright (c) 2018 The Bitcoin Core developers
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
"""Verify commits against a trusted keys list."""

GIT = os.getenv('GIT','git')

def tree_sha512sum(commit='HEAD'):
# request metadata for entire tree, recursively
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: update the comment to say that this is copied from github-merge.py

raise IOError('Non-zero return value executing git cat-file')
return overall.hexdigest()

def main():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few line breaks would be helpful here!

@ken2812221
Copy link
Contributor Author

@jnewbery Thanks for your review! I would like to take your nits.

@jnewbery
Copy link
Contributor

I would like to take your nits.

Great! Feel free to squash the changes into your commit.

I've now tested this and verified that:

  • f8feaa4636260b599294c7285bcf1c8b7737f74e and 8040ae6fc576e9504186f2ae3ff2c8125de1095c do have missing/incorrect Tree_SHA512 digests in the git logs, and do not contain malicious code.
  • 6052d509105790a26b3ad5df43dd61e7f1b24a12, 3798e5de334c3deb5f71302b782f6b8fbd5087f1, 326ffed09bfcc209a2efd6a2ebc69edf6bd200b5 and 97d83739db0631be5d4ba86af3616014652c00ec are not clean merges and do not contain malicious code.

Tested ACK 8c1515e.

@ken2812221
Copy link
Contributor Author

Squashed

@TheBlueMatt
Copy link
Contributor

I'm still wondering why migrate to python? It seems like most of the new code just does the same stuff that the old code did but instead of bash pipes it does a bunch of stdin/out reading, which I find way less readable.

Definitely Concept ACK the allow-unclean-merge-commits file and just making travis only go back a ways instead of doing a full check.

@ken2812221
Copy link
Contributor Author

ken2812221 commented Jun 12, 2018

@TheBlueMatt The major reason I do this PR is that bash script is too slow. It took about three hours to verify commits to trusted git root and unclean merges with bash and took about one hour with python on my PC.

Also, current bash script only check Tree-SHA512 with commit HEAD, it would take even longer if we check it every commit.

@laanwj laanwj merged commit e5b2cd8 into bitcoin:master Jun 12, 2018
laanwj added a commit that referenced this pull request Jun 12, 2018
e5b2cd8 Use python instead of slow shell script on verify-commits (Chun Kuan Lee)

Pull request description:

  The cron job that runs every day would fail because of git checkout a single commit, not a branch.

  #12708 introduce a method to check whether merges are clean.
  However, there are four merges are not clean.
  So, I add a list of merges that are dirty and ignore them.

  Also, I modify the current shell script to python, it makes the script speed up a lot.
  The python code `tree_sha512sum` was copied from `github-merge.py`

  I've re-designed this. Now we verify all the things by default.
  - Add `--disable-tree-check` option, not to check SHA-512 tree
  - Add `--clean-merge NUMBER` option, only verify commits after &lt;NUMBER> days ago

  Travis running time:

  |option|time|
  |-|-|
  |verify-commits.py|[25m47.02s(1547.02s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)|
  |verify-commits.py --disable-tree-check|[19m10.08s(1150.08s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)|
  |verify-commits.py --clean-merge 30|[9m18.18s(558.18s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)|
  |verify-commits.py --disable-tree-check --clean-merge 30|[1m16.51s(76.51s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)|

  Since the cron job always fail, I've created a respository to verify this daily.
   [![Build Status](https://travis-ci.org/ken2812221/bitcoin-verify-commits.svg?branch=master)](https://travis-ci.org/ken2812221/bitcoin-verify-commits)

Tree-SHA512: 476bcf707d92ed3d431ca5642e013036df1506120d3dd2aa718f74240063ce856abd78f4c948336c2a6230dfe5c60c6f2d52d19bdb52d647a1c5f838eaa02e3b
@ken2812221 ken2812221 deleted the verify-commits branch June 12, 2018 15:26
overall.update(" ".encode("utf-8"))
overall.update(f)
overall.update("\n".encode("utf-8"))
p.stdin.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems possible this code could hang depending on internals of git or the operating system, if reads from stdin are buffered. I think moving this stdin.close() line up to where the stdin.flush() is could avoid this. Another option is to use https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate, which avoids deadlocks of this type.

raise IOError('Premature EOF reading git cat-file output')
ptr += bs
dig = intern.hexdigest()
assert p.stdout.read(1) == b'\n' # ignore LF that follows blob data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to move code that has side effects outside of assert statements, because assert statements are stripped out with python's -O option.

maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Jun 13, 2018
…g commits

51ed05a travis: Increase travis_wait time while verifying commits (Chun Kuan Lee)

Pull request description:

  From https://travis-ci.org/ken2812221/bitcoin-verify-commits/builds
  I have run vecify-commits.py nightly on travis, as you can see that 30 minutes is not enough, it took 30-50 minutes to run the script with no extra options. So change it to 50 minutes would be better. Forgot to change this at bitcoin#13066

Tree-SHA512: fced346edcb7dc626fa6e714d7db9a34d3e9f283ada64e19c30565ed214f72e26a57a905b4eb71ab24a16c97a1bd35e57e9a83dbb7dc1fff2db26505b810e61e
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request May 5, 2021
…travis

e5b2cd8 Use python instead of slow shell script on verify-commits (Chun Kuan Lee)

Pull request description:

  The cron job that runs every day would fail because of git checkout a single commit, not a branch.

  bitcoin#12708 introduce a method to check whether merges are clean.
  However, there are four merges are not clean.
  So, I add a list of merges that are dirty and ignore them.

  Also, I modify the current shell script to python, it makes the script speed up a lot.
  The python code `tree_sha512sum` was copied from `github-merge.py`

  I've re-designed this. Now we verify all the things by default.
  - Add `--disable-tree-check` option, not to check SHA-512 tree
  - Add `--clean-merge NUMBER` option, only verify commits after &lt;NUMBER> days ago

  Travis running time:

  |option|time|
  |-|-|
  |verify-commits.py|[25m47.02s(1547.02s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)|
  |verify-commits.py --disable-tree-check|[19m10.08s(1150.08s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)|
  |verify-commits.py --clean-merge 30|[9m18.18s(558.18s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)|
  |verify-commits.py --disable-tree-check --clean-merge 30|[1m16.51s(76.51s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)|

  Since the cron job always fail, I've created a respository to verify this daily.
   [![Build Status](https://travis-ci.org/ken2812221/bitcoin-verify-commits.svg?branch=master)](https://travis-ci.org/ken2812221/bitcoin-verify-commits)

Tree-SHA512: 476bcf707d92ed3d431ca5642e013036df1506120d3dd2aa718f74240063ce856abd78f4c948336c2a6230dfe5c60c6f2d52d19bdb52d647a1c5f838eaa02e3b
UdjinM6 pushed a commit to UdjinM6/dash that referenced this pull request May 6, 2021
…travis

e5b2cd8 Use python instead of slow shell script on verify-commits (Chun Kuan Lee)

Pull request description:

  The cron job that runs every day would fail because of git checkout a single commit, not a branch.

  bitcoin#12708 introduce a method to check whether merges are clean.
  However, there are four merges are not clean.
  So, I add a list of merges that are dirty and ignore them.

  Also, I modify the current shell script to python, it makes the script speed up a lot.
  The python code `tree_sha512sum` was copied from `github-merge.py`

  I've re-designed this. Now we verify all the things by default.
  - Add `--disable-tree-check` option, not to check SHA-512 tree
  - Add `--clean-merge NUMBER` option, only verify commits after &lt;NUMBER> days ago

  Travis running time:

  |option|time|
  |-|-|
  |verify-commits.py|[25m47.02s(1547.02s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)|
  |verify-commits.py --disable-tree-check|[19m10.08s(1150.08s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)|
  |verify-commits.py --clean-merge 30|[9m18.18s(558.18s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)|
  |verify-commits.py --disable-tree-check --clean-merge 30|[1m16.51s(76.51s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)|

  Since the cron job always fail, I've created a respository to verify this daily.
   [![Build Status](https://travis-ci.org/ken2812221/bitcoin-verify-commits.svg?branch=master)](https://travis-ci.org/ken2812221/bitcoin-verify-commits)

Tree-SHA512: 476bcf707d92ed3d431ca5642e013036df1506120d3dd2aa718f74240063ce856abd78f4c948336c2a6230dfe5c60c6f2d52d19bdb52d647a1c5f838eaa02e3b
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants