-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Archive support #1872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archive support #1872
Conversation
4181d51
to
0b4c674
Compare
(Benchmark results are only added if tools are installed) Example output: ``` ======================================================================== generating profile data ------------------------------------------------------------------------ - mode: dir benchmark: tool: hyperfine path: profile/1748019612/dir/benchmark.json results: mean: 1.12852930166 stddev: 0.12163775966037399 median: 1.07242928416 user: 3.9699382799999996 system: 0.04382341999999999 min: 1.03207673016 max: 1.3913475181600001 profile: - mode: cpu path: profile/1748019612/dir/cpu.pprof view: go tool pprof -http=localhost: ./gitleaks profile/1748019612/dir/cpu.pprof - mode: mem path: profile/1748019612/dir/mem.pprof view: go tool pprof -http=localhost: ./gitleaks profile/1748019612/dir/mem.pprof - mode: trace path: profile/1748019612/dir/trace.out view: go tool trace profile/1748019612/dir/trace.out - mode: git ...snip... ```
6c2621d
to
08fd665
Compare
a33ca76
to
1e55376
Compare
Okay, I'm pretty happy with backwards compatibility when |
Stats The Scanning the kubernetes repo: ./archives git \
--config gitleaks.toml --exit-code=0 \
--max-archive-depth=8 --max-decode-depth=8 \
repos/git/kubernetes
./master git \
--config gitleaks.toml --exit-code=0 \
--max-decode-depth=8 \
repos/git/kubernetes
Scanning the gitlab repo: ./archives git \
--config gitleaks.toml --exit-code=0 \
--max-archive-depth=8 --max-decode-depth=8 \
repos/git/gitlab
./master git \
--config gitleaks.toml --exit-code=0 \
--max-decode-depth=8 \
repos/git/gitlab
|
@bplaxco so it looks like the |
@zricethezav I'm not sure, would you be opposed to me tweaking the diagnostic feature to use net/http/pprof so you can sample the pprof data as needed during the run before it writes out the final dump at the end? I might be able to take some of those and the pprof diff flags to see where the savings are happening. My gut says it's prob yield + not scanning as many commits, but that's just a guess. |
If you're up for it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional comments and observations, while I wait for this to chew through a >10GB repository...
f8cd34c
to
9ed45b8
Compare
- Update blobReader.Close() to discard the buffer - Misc logger issues & uses - Tweak .golangci.yaml to default to none - Discard remaining blobReader data on close - Undo a De Morgan's Law suggestion (and disable QF1001)
gitleaks --diagnostics=http ... ----- From the net/http/pprof docs: Use the pprof tool to look at the heap profile: ``` go tool pprof http://localhost:6060/debug/pprof/heap ``` Or to look at a 30-second CPU profile: ``` go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30 ``` Or to look at the goroutine blocking profile, after calling runtime.SetBlockProfileRate in your program: ``` go tool pprof http://localhost:6060/debug/pprof/block ``` Or to look at the holders of contended mutexes, after calling runtime.SetMutexProfileFraction in your program: ``` go tool pprof http://localhost:6060/debug/pprof/mutex ``` (For more info see https://pkg.go.dev/net/http/pprof)
Ended up not being too bad ^_^ 2886f77 |
@@ -118,7 +118,7 @@ jobs: | |||
- id: gitleaks | |||
``` | |||
|
|||
for a [native execution of GitLeaks](https://github.com/gitleaks/gitleaks/releases) or use the [`gitleaks-docker` pre-commit ID](https://github.com/gitleaks/gitleaks/blob/master/.pre-commit-hooks.yaml) for executing GitLeaks using the [official Docker images](#docker) | |||
for a [native execution of gitleaks](https://github.com/gitleaks/gitleaks/releases) or use the [`gitleaks-docker` pre-commit ID](https://github.com/gitleaks/gitleaks/blob/master/.pre-commit-hooks.yaml) for executing gitleaks using the [official Docker images](#docker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave these capitalized
|
||
The [compression](https://github.com/mholt/archives?tab=readme-ov-file#supported-compression-formats) | ||
and [archive](https://github.com/mholt/archives?tab=readme-ov-file#supported-archive-formats) | ||
formats supported by mholt's [archives package](https://github.com/mholt/archives) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mholt, thanks for the excellent archives package! 🙇🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're welcome!! Glad you find it useful.
Amazing work @bplaxco! (per usual). @rgmz thank you for the help in the review 🙇🏻. I think this PR is in fantastic shape. All the testing, profiling, and manually running gives me confidence this likely won't break anything. If it does, we'll fix'er up real quick. Not only do we get archive scanning out of this but it's also a huge improvement to the source package. Sources generate fragments. Simple as. Gonna hit the merge button |
Description:
Related to #1841
Talked with @zricethezav to see if I could take a go at doing it in-memory and with a change to how sources are handled.
The idea is that sources could have a
Fragments(FragmentFunc) error
function wheretype FragmentFunc func(Fragment, error) error
. The behavior should be likeWalkDir
where the callback determines how an error should be handled and if the callback returns an error, fragment traversal stops and the function returns that error.Tasks:
Source.Fragments
File
Git
source and update theDetectGit
related functions ad cmdsGit
--max-archive-depth= (default 0)
flag && update docs (thank you @alayne222!)[1]: I want to make sure I completely resolved the issue I was seeing before where the secret counts weren't lining up properly. Some of it was a problem related to me using a buffered reader in one spot and then using the reader again. But then there were some weird cases where I wasn't seeing the same number of secrets in every place.
Checklist: