Skip to content

Conversation

recreator66
Copy link
Contributor

@recreator66 recreator66 commented Feb 11, 2025

Description:

Closes #1727

Checklist:

  • Does your PR pass tests?
  • Have you written new tests for your changes?
  • Have you lint your code locally prior to submission?

@recreator66 recreator66 marked this pull request as draft February 11, 2025 20:43
@recreator66
Copy link
Contributor Author

PR is currently WIP. There is still some testing necessary if the changes have any impact on the core functionalities (e.g. passing config file path, gitleaks-ignore-path etc.)

I am happy for any volunteer that reviews the changes and tests Detector functionalities.

Copy link
Contributor

@rgmz rgmz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good base.

I have a few suggestions to streamline the approach a bit. You can apply this with git apply patch.txt: patch.txt

cmd/directory.go Outdated
var allFindings []report.Finding

start := time.Now()
findingsMap := make(map[string]report.Finding)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this map? Wouldn't it technically change the behaviour of how things are reported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view the behavior itself would not change. The only reason for creating the map is to remove redundant findings by overriding them with the same fingerprint. Otherwise the report would not be accurate anymore when e.g. scanning a root directory with a child directory.

You can test this by scanning the ./testdata directory with ./testdata/repos/nogit as additional argument.

cmd/directory.go Outdated
for _, finding := range findings {
findingsMap[finding.Fingerprint] = finding
}
detector = det
Copy link
Contributor

@rgmz rgmz Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unfortunate consequence of configuring report output under Detector. I don't know if there's a better way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. In general, more testing is required from my end as new Detector structs will be initialized for each directory. This basically has an impact on all cli flag that you pass in.

Will think about a better way, ideally there is only one Detector initialized that stores all flags with e.g. absolute paths for config files etc

@rgmz
Copy link
Contributor

rgmz commented Mar 11, 2025

Can you elaborate? It sounds like the results are the same?

Result of gitleaks detect --no-git testdata/expected/git -v and gitleaks detect --no-git . -v is the same.
...
I would have expected that the result ... is equal..

@recreator66
Copy link
Contributor Author

Update

Test Case 1

  1. Compile both gitleaks binaries from source and target branch of this MR.
  2. Execute following command with gitleaks binary from master branch
    • gitleaks dir testdata/repos/staged - reports 2 leaks
    • gitleaks dir testdata/repos/nogit - reports 1 leak
  3. Execute following command with gitleaks binary from recreator66:feat/dir-multiple-args branch
    • gitleaks dir testdata/repos/staged testdata/repos/nogit - reports 3 leaks

Test Case 2

  1. Compile both gitleaks binaries from source and target branch of this MR.
  2. Create a custom configuration toml file e.g. gitleaks-toml.zip
  3. Execute following command with gitleaks binary from master branch
    • gitleaks dir testdata/config -c gitleaks-test.toml - reports 1 leak
    • gitleaks dir testdata/repos -c gitleaks-test.toml - reports 1 leak
  4. Execute following command with gitleaks binary from recreator66:feat/dir-multiple-args branch
    • gitleaks dir testdata/config testdata/repos -c gitleaks-test.toml - reports 2 leaks

Even with passing a custom toml I haven't noticed any issues with the changes.
The detailed reports can be compared with verbose mode/ flag.

Since the value of config flag is the only one used to setup the gitleaks config I assume that the changes made within this MR do not have an impact on how the detector is initialized.

To-Do

Test and adjust the logging while initializing new detectors. You can see e.g. in debug mode that gitleaks is logging the construction of additional detectors from L39. This logging needs to be suppressed.

Output from scanning 2 directories:

gitleaks dir testdata/config testdata/repos -c gitleaks-test.toml -l debug

    ○
    │╲
    │ ○
    ○ ░
    ░    gitleaks

10:38PM DBG using github.com/wasilibs/go-re2 regex engine
10:38PM DBG using gitleaks config gitleaks-test.toml from `--config`
10:38PM DBG found .gitleaksignore file: .gitleaksignore
10:38PM DBG using github.com/wasilibs/go-re2 regex engine
10:38PM DBG using gitleaks config gitleaks-test.toml from `--config`
10:38PM DBG found .gitleaksignore file: .gitleaksignore
10:38PM DBG Skipping symlink path=testdata/repos/symlinks/file_symlink/symlinked_id_ed25519
10:38PM INF scanned ~17007 bytes (17.01 KB) in 52.9ms
10:38PM WRN leaks found: 2

@recreator66 recreator66 marked this pull request as ready for review March 17, 2025 21:35
@recreator66
Copy link
Contributor Author

recreator66 commented Mar 17, 2025

@rgmz I have adopted your approach to initialize the config once while iterating over the args (= directories). In addition, I enhanced the logging within the gitleaks directory command.

PR is ready for review

@recreator66
Copy link
Contributor Author

@rgmz is this PR still relevant?

@UnknownPlatypus
Copy link

Does the current state of the pr allows doing gitleaks dir file1 file2 ... ?

If so it's still very relevant to me to be able to fix the pre-commit integration (see #1409 (comment))

pre-commit run tools on a list of files (usually the one your are trying to commit) so I really need this PR behavior to make the pre-commit integration work properly.

@recreator66
Copy link
Contributor Author

@UnknownPlatypus yes the current state allows doing gitleaks dir file1 file2 ... and reports unique findings.

@Holzhaus
Copy link

Holzhaus commented Jul 22, 2025

@recreator66 Can you resolve the merge conflicts? Thanks!

@rgmz Anything else missing here?

I really would like to get #1409 fixed, and both this PR as well as #1918 are needed to fix the issue without introducing a major performance regression.

@recreator66
Copy link
Contributor Author

@Holzhaus this PR was already in a ready state back in March, but I haven’t received any feedback or review since then. I'm open to resolving the merge conflicts, but I’d prefer to avoid spending more time on it unless there's a clear intent to review and potentially merge it.

Let me know @rgmz

Copy link

@Holzhaus Holzhaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

logging.Err(scanErr).
Str("path", arg).
Msg("failed scan path")
err = scanErr

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple paths raise an error, only the last one will be kept in err, correct? If so, I guess it's not a problem because we still have them logged anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your assumption is correct. Only the last error will be detected and passed to findingSummaryAndExit.

Anyway, since we log it the output looks like this (without verbose mode)

./gitleaks dir testdata/config123 testdata123 testdata/repos -c .gitleaks-test.toml    

    ○
    │╲
    │ ○
    ○ ░
    ░    gitleaks

9:09PM WRN skipping error="lstat testdata/config123: no such file or directory" path=testdata/config123
9:09PM WRN skipping error="lstat testdata123: no such file or directory" path=testdata123
9:09PM INF scanned ~0 bytes (0) in 27.3ms
9:09PM WRN leaks found: 1

@Holzhaus
Copy link

@recreator66 Maybe you can consider using the "Request review" function in the upper right corner to request a review from @rgmz and/or @zricethezav.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Passing multiple arguments to dir causes a fill scan of the current directory
4 participants