-
Notifications
You must be signed in to change notification settings - Fork 565
Description
Describe the bug
Running Scorecard with Binary-Artifact and/or Pinned-Dependencies on a repo with large files crashes entirely.
Reproduction steps
I stumbled on this while trying to run Scorecard on a local clone of a HuggingFace model repository.
Steps to reproduce the behavior:
# Requires LFS support; ~6GB download!
$ git clone https://huggingface.co/microsoft/phi-2
# same outcome with each check individually, too
$ scorecard --local ~/phi-2 --checks Binary-Artifacts,Pinned-Dependencies
Starting [Binary-Artifacts]
Starting [Pinned-Dependencies]
signal: killed
Deleting the very large files (including the .git folder), the checks pass. (There may be other checks that would also fail, I only tested those that run with --local
)
Expected behavior
The checks should work even with large files.
As described below, Binary-Artifacts doesn't need to load the entire file, and it's unlikely an actual script will ever be big enough to be a problem.
Additional context
I believe I understand why these checks are failing: both have at least one function (BinaryArtifacts and collectShellScriptInsecureDownloads) that runs fileparser.OnMatchingFileContentDo
with Pattern: "*"
(i.e. all files).
As the function name implies, this function sequentially opens and loads all matching files. I assume one of the files was simply too large.
This should be fixable, though:
BinaryArtifacts
usesfileparser.OnMatchingFileContentDo
to callcheckBinaryFileContent
. That loads the file and then uses https://github.com/h2non/filetype to determine the file's type. This can be replaced by:- Simply trusting a file's extension, and only checking its contents if it has no extension (since it could be anything)
- Replacing
fileparser.OnMatchingFileContentDo
with a function that only loads the first 262 bytes h2non/filetype needs
- Pinned-Deps'
collectShellScriptInsecureDownloads
could be set to only run on files with common script extensions (i.e..sh
,.bash
,.ps
, and no extension).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status