-
Notifications
You must be signed in to change notification settings - Fork 37.8k
Migrate verify-commits script to python, run in travis #13066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Mind to elaborate why it wouldn't work on a "single commit"? |
|
Thanks! Yes, the current implementation has been slow especially the treehash512 checking, my implementation from Concept ACK. |
I think we should remove the shell script after this has been converted to python. Objections, @TheBlueMatt ? |
PREV_COMMIT = "" | ||
INITIAL_COMMIT = CURRENT_COMMIT | ||
|
||
BRANCH = subprocess.check_output([GIT,'rev-parse','--abbrev-ref','HEAD'], universal_newlines=True).splitlines()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be INITIAL_COMMIT
instead of HEAD
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a direct transcription from the shell script at least:
BRANCH="$(git rev-parse --abbrev-ref HEAD)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is a string 'HEAD', it would return a branch name. If you put a commit id, it returns nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'HEAD', it would return a branch name. If you put a commit id, it returns nothing.
It always returns a commit id for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the git version might be different. I'm using git 2.1.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that's it. To me this seems like a thing to fix (The script shouldn't depends on what version of git you are using)
Also, I still think that the correct thing to use here is INITIAL_COMMIT
. I.e. git log --format="%H" -1 "${INITIAL_COMMIT}"
. Otherwise a comment might be appropriate to say why HEAD
is used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use it just because the original shell script, so I change it to git show -s --format="%H" ${INITIAL_COMMIT}
now, and looks like we don't need to change the travis script.
All the unclean merge commits are from 2016 or earlier. Perhaps instead there can just be a threshold that skips the validation below a certain age (say, 1 month)? |
Or do we need a trusted-clean-merge-root? This can be when that merge policy start. |
Test for only checking commits in one month: https://travis-ci.org/ken2812221/bitcoin/jobs/371267929 |
Squashed and delete shell script. We can run the run the script in about 80 seconds, so can we revert fa6f12a? |
I think the PR title should be "Migrate verify-commits script to python, run in travis" (or similar) |
@jonasschnelli Thanks |
I did a timing comparison locally, this is an impressive win (given that the checks are correct):
|
IRC discussion:
As far as I see this is the case, so utACK 058d18bdf302976b6d59752ab816463c89f70404. |
I'm fine with migrating this to python, but I certainly dont know python well enough to verify something that is intended to be a secure way of checking our git tree, nor maintain it in the future. If this change means I dont have to and others will, however, thats fine. |
NACK changing the script to only verify up to N commits instead of up to a trusted root by default. Its fine to do that on travis but that obviously defeats the point of "checking the git commits come from a trusted source and don't just have a different timestamp on them". |
I have two ways to do this. The first, add allow-unclean-merge-commits. The list is following.
The second way is add trusted-unclean-merge-commit-root. It would be
@TheBlueMatt Which way would you prefer? |
Given that the speedup is just based on skipping verification, I'd slightly prefer to stick around with the bash script for now and do the python transcribe in a follow up pr. |
@MarcoFalke If we do the entire verification in python, it still just take 18 mins on travis. |
I've re-designed this. Now we verify all the things by default.
Travis running time:
Since the cron job always fail, I've created a respository to verify this daily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 3ef8327.
This looks generally good to me. I started adding some nits but decided to just go ahead and implement the changes myself here: https://github.com/jnewbery/bitcoin/tree/pr13066.1 . Feel free to take as much of that as you want. It's mostly nits and style - adding comments, fixing flake8 warnings, etc.
I'm not a shell expert so something subtle may have escaped me, but it looks functionally the same as verify-commits.sh.
I haven't verified the incorrect sha 512 commits and unclean merge commits.
import subprocess | ||
import sys | ||
import time | ||
from subprocess import PIPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're already importing subprocess. Suggest you just use subprocess.PIPE
throughout.
Same for sys.stderr
below.
@@ -0,0 +1,131 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include a copyright notice and module docstring, eg:
# Copyright (c) 2018 The Bitcoin Core developers
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
"""Verify commits against a trusted keys list."""
GIT = os.getenv('GIT','git') | ||
|
||
def tree_sha512sum(commit='HEAD'): | ||
# request metadata for entire tree, recursively |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: update the comment to say that this is copied from github-merge.py
raise IOError('Non-zero return value executing git cat-file') | ||
return overall.hexdigest() | ||
|
||
def main(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few line breaks would be helpful here!
@jnewbery Thanks for your review! I would like to take your nits. |
Great! Feel free to squash the changes into your commit. I've now tested this and verified that:
Tested ACK 8c1515e. |
Squashed |
I'm still wondering why migrate to python? It seems like most of the new code just does the same stuff that the old code did but instead of bash pipes it does a bunch of stdin/out reading, which I find way less readable. Definitely Concept ACK the allow-unclean-merge-commits file and just making travis only go back a ways instead of doing a full check. |
@TheBlueMatt The major reason I do this PR is that bash script is too slow. It took about three hours to verify commits to trusted git root and unclean merges with bash and took about one hour with python on my PC. Also, current bash script only check Tree-SHA512 with commit HEAD, it would take even longer if we check it every commit. |
e5b2cd8 Use python instead of slow shell script on verify-commits (Chun Kuan Lee) Pull request description: The cron job that runs every day would fail because of git checkout a single commit, not a branch. #12708 introduce a method to check whether merges are clean. However, there are four merges are not clean. So, I add a list of merges that are dirty and ignore them. Also, I modify the current shell script to python, it makes the script speed up a lot. The python code `tree_sha512sum` was copied from `github-merge.py` I've re-designed this. Now we verify all the things by default. - Add `--disable-tree-check` option, not to check SHA-512 tree - Add `--clean-merge NUMBER` option, only verify commits after <NUMBER> days ago Travis running time: |option|time| |-|-| |verify-commits.py|[25m47.02s(1547.02s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)| |verify-commits.py --disable-tree-check|[19m10.08s(1150.08s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)| |verify-commits.py --clean-merge 30|[9m18.18s(558.18s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)| |verify-commits.py --disable-tree-check --clean-merge 30|[1m16.51s(76.51s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)| Since the cron job always fail, I've created a respository to verify this daily. [](https://travis-ci.org/ken2812221/bitcoin-verify-commits) Tree-SHA512: 476bcf707d92ed3d431ca5642e013036df1506120d3dd2aa718f74240063ce856abd78f4c948336c2a6230dfe5c60c6f2d52d19bdb52d647a1c5f838eaa02e3b
overall.update(" ".encode("utf-8")) | ||
overall.update(f) | ||
overall.update("\n".encode("utf-8")) | ||
p.stdin.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems possible this code could hang depending on internals of git or the operating system, if reads from stdin are buffered. I think moving this stdin.close() line up to where the stdin.flush() is could avoid this. Another option is to use https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate, which avoids deadlocks of this type.
raise IOError('Premature EOF reading git cat-file output') | ||
ptr += bs | ||
dig = intern.hexdigest() | ||
assert p.stdout.read(1) == b'\n' # ignore LF that follows blob data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to move code that has side effects outside of assert statements, because assert statements are stripped out with python's -O
option.
…g commits 51ed05a travis: Increase travis_wait time while verifying commits (Chun Kuan Lee) Pull request description: From https://travis-ci.org/ken2812221/bitcoin-verify-commits/builds I have run vecify-commits.py nightly on travis, as you can see that 30 minutes is not enough, it took 30-50 minutes to run the script with no extra options. So change it to 50 minutes would be better. Forgot to change this at bitcoin#13066 Tree-SHA512: fced346edcb7dc626fa6e714d7db9a34d3e9f283ada64e19c30565ed214f72e26a57a905b4eb71ab24a16c97a1bd35e57e9a83dbb7dc1fff2db26505b810e61e
…travis e5b2cd8 Use python instead of slow shell script on verify-commits (Chun Kuan Lee) Pull request description: The cron job that runs every day would fail because of git checkout a single commit, not a branch. bitcoin#12708 introduce a method to check whether merges are clean. However, there are four merges are not clean. So, I add a list of merges that are dirty and ignore them. Also, I modify the current shell script to python, it makes the script speed up a lot. The python code `tree_sha512sum` was copied from `github-merge.py` I've re-designed this. Now we verify all the things by default. - Add `--disable-tree-check` option, not to check SHA-512 tree - Add `--clean-merge NUMBER` option, only verify commits after <NUMBER> days ago Travis running time: |option|time| |-|-| |verify-commits.py|[25m47.02s(1547.02s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)| |verify-commits.py --disable-tree-check|[19m10.08s(1150.08s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)| |verify-commits.py --clean-merge 30|[9m18.18s(558.18s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)| |verify-commits.py --disable-tree-check --clean-merge 30|[1m16.51s(76.51s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)| Since the cron job always fail, I've created a respository to verify this daily. [](https://travis-ci.org/ken2812221/bitcoin-verify-commits) Tree-SHA512: 476bcf707d92ed3d431ca5642e013036df1506120d3dd2aa718f74240063ce856abd78f4c948336c2a6230dfe5c60c6f2d52d19bdb52d647a1c5f838eaa02e3b
…travis e5b2cd8 Use python instead of slow shell script on verify-commits (Chun Kuan Lee) Pull request description: The cron job that runs every day would fail because of git checkout a single commit, not a branch. bitcoin#12708 introduce a method to check whether merges are clean. However, there are four merges are not clean. So, I add a list of merges that are dirty and ignore them. Also, I modify the current shell script to python, it makes the script speed up a lot. The python code `tree_sha512sum` was copied from `github-merge.py` I've re-designed this. Now we verify all the things by default. - Add `--disable-tree-check` option, not to check SHA-512 tree - Add `--clean-merge NUMBER` option, only verify commits after <NUMBER> days ago Travis running time: |option|time| |-|-| |verify-commits.py|[25m47.02s(1547.02s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)| |verify-commits.py --disable-tree-check|[19m10.08s(1150.08s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)| |verify-commits.py --clean-merge 30|[9m18.18s(558.18s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)| |verify-commits.py --disable-tree-check --clean-merge 30|[1m16.51s(76.51s)](https://travis-ci.org/ken2812221/bitcoin/jobs/373321423)| Since the cron job always fail, I've created a respository to verify this daily. [](https://travis-ci.org/ken2812221/bitcoin-verify-commits) Tree-SHA512: 476bcf707d92ed3d431ca5642e013036df1506120d3dd2aa718f74240063ce856abd78f4c948336c2a6230dfe5c60c6f2d52d19bdb52d647a1c5f838eaa02e3b
The cron job that runs every day would fail because of git checkout a single commit, not a branch.
#12708 introduce a method to check whether merges are clean.
However, there are four merges are not clean.
So, I add a list of merges that are dirty and ignore them.
Also, I modify the current shell script to python, it makes the script speed up a lot.
The python code
tree_sha512sum
was copied fromgithub-merge.py
I've re-designed this. Now we verify all the things by default.
--disable-tree-check
option, not to check SHA-512 tree--clean-merge NUMBER
option, only verify commits after <NUMBER> days agoTravis running time:
Since the cron job always fail, I've created a respository to verify this daily.
