Skip to content

BUG: Binary-Artifact and Pinned-Dependencies kill Scorecard in a repo with large files #3831

@pnacht

Description

@pnacht

Describe the bug
Running Scorecard with Binary-Artifact and/or Pinned-Dependencies on a repo with large files crashes entirely.

Reproduction steps
I stumbled on this while trying to run Scorecard on a local clone of a HuggingFace model repository.

Steps to reproduce the behavior:

# Requires LFS support; ~6GB download!
$ git clone https://huggingface.co/microsoft/phi-2

# same outcome with each check individually, too
$ scorecard --local ~/phi-2 --checks Binary-Artifacts,Pinned-Dependencies
Starting [Binary-Artifacts]
Starting [Pinned-Dependencies]
signal: killed

Deleting the very large files (including the .git folder), the checks pass. (There may be other checks that would also fail, I only tested those that run with --local)

Expected behavior
The checks should work even with large files.

As described below, Binary-Artifacts doesn't need to load the entire file, and it's unlikely an actual script will ever be big enough to be a problem.

Additional context
I believe I understand why these checks are failing: both have at least one function (BinaryArtifacts and collectShellScriptInsecureDownloads) that runs fileparser.OnMatchingFileContentDo with Pattern: "*" (i.e. all files).

As the function name implies, this function sequentially opens and loads all matching files. I assume one of the files was simply too large.

This should be fixable, though:

  • BinaryArtifacts uses fileparser.OnMatchingFileContentDo to call checkBinaryFileContent. That loads the file and then uses https://github.com/h2non/filetype to determine the file's type. This can be replaced by:
    • Simply trusting a file's extension, and only checking its contents if it has no extension (since it could be anything)
    • Replacing fileparser.OnMatchingFileContentDo with a function that only loads the first 262 bytes h2non/filetype needs
  • Pinned-Deps' collectShellScriptInsecureDownloads could be set to only run on files with common script extensions (i.e. .sh, .bash, .ps, and no extension).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Backlog - New Checks

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions