Skip to content

Conversation

rgmz
Copy link
Contributor

@rgmz rgmz commented Sep 7, 2024

Description:

This is an experimental implementation of two things I described here:

  • Support Multiple Allowlist Targets
    Right now you can only chose a single regexTarget per allowlist (global or rule). This makes it awkward when you want to ignore multiple patterns, where one is the secret (e.g., {{ secrets.MY_SECRET}}) and another is the match or line (e.g., passwordFile: ... or token_id: ...). The ability to definite separate regexes for secret and match or line would be a nice quality-of-life improvement.

  • Support Allowlist matching path && regex
    Presently, rule.path + rule.regex and rule.allowlist.paths + rule.allowlist.regexes behave differently. Unlike rule which matches path && regex, the allowlist will match path || regex which can lead to unexpected results.

With this change, a rule can have broad regexes AND ones specific to a certain path. This is a bit of a pain point for more complex configurations, at least in my experience.

A contrived example for the sake of demonstration:

[[rules]]
id = "sumologic-access-id"
description = "Discovered a SumoLogic Access ID, potentially compromising log management services and data analytics integrity."
regex = '''(?i:(?:sumo)(?:[0-9a-z\-_\t .]{0,20})(?:[\s|']|[\s|"]){0,3})(?:=|>|:{1,3}=|\|\|:|<=|=>|:|\?=)(?:'|\"|\s|=|\x60){0,5}(su[a-zA-Z0-9]{12})(?:['|\"|\n|\r|\s|\x60|;]|$)'''
entropy = 3
keywords = [
    "sumo",
]
# Allow `sumOf` anywhere
[[rules.allowlists]]
regexTarget = "line"
regexes = [
    "sumOf",
]
# Allow `abcdefg` when found under `node_modules/@sumologic/opentelemetry-node`
[[rules.allowlists]]
condition = "AND" # default condition is "OR".
regexes = [ 
 '''abcdefg''',
]
paths = [
    "node_modules/@sumologic/opentelemetry-node/.+$",
]

This closes #1512.

Checklist:

  • Does your PR pass tests?
  • Have you written new tests for your changes?
  • Have you lint your code locally prior to submission?

@rgmz rgmz force-pushed the experiment/multiple-allowlist branch 2 times, most recently from c526694 to 364eeed Compare September 28, 2024 02:23
@rgmz rgmz force-pushed the experiment/multiple-allowlist branch 11 times, most recently from 815a074 to 111227b Compare October 12, 2024 16:04
@@ -269,11 +309,9 @@ func (c *Config) extend(extensionConfig Config) {
}
baseRule.Tags = append(baseRule.Tags, currentRule.Tags...)
baseRule.Keywords = append(baseRule.Keywords, currentRule.Keywords...)
baseRule.Allowlist.Commits = append(baseRule.Allowlist.Commits, currentRule.Allowlist.Commits...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this block getting deleted? The extend feature for a singular rule allowlist should function the same as before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this preserves the prior functionality, while allowing the new stuff to function as expected. (Correct me if I'm wrong.)

Consider the following config:

[extend]
useDefault = true

[[rules]]
id = "generic-api-key"
[rules.allowlist]
paths = ['''Some_Files\.xaml''']
stopwords = ["foo"]

With the append logic this would get translated as the following (pseudo-code):

allowlist{
  paths = ["some_files.xaml"]
  stopwords = ["foo", DefaultStopWords... ]
}

With the new logic, it would get translated as the following. Despite the difference in structure, the logic is still the same: paths == "some_files.xaml" || stopwords == "foo" || stopwords == DefaultStopWords....

allowlist{ // extended
  stopwords = [ DefaultStopWords... ]
},
allowlist{ // our config
  paths = ["some_files.xaml"]
  stopwords = ["foo" ]
}

As for the "new stuff", if either your or the extended rule's has an allowlist with AND then appending commits/paths/regexes/stopwords will break that.

It would be possible to carve out special handling for [rules.allowlist] vs. [[rules.allowlist]], or append for OR and not AND, but I figured this way is the most straightforward. I can add more tests to be certain.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new logic, it would get translated as the following. Despite the difference in structure, the logic is still the same: paths == "some_files.xaml" || stopwords == "foo" || stopwords == DefaultStopWords....

I gotcha, that makes sense. Boy this would be much easier to test if I made that config command already

@rgmz rgmz force-pushed the experiment/multiple-allowlist branch from 111227b to 9f26aa7 Compare October 13, 2024 00:51
@bplaxco
Copy link
Contributor

bplaxco commented Oct 13, 2024

At first blush it wasn't 100% clear to me that

[[rules.allowlists]]
condition = "AND" # default condition is "OR".

applied to the different checks in this allowlist and not across the different allowlists associated with the rule. I don't know that I have a great suggestion at this moment. It wasn't hard to figure out after looking closer, but just trying to think of something that someone could catch at a glance if they weren't familiar with the behavior.

Maybe: allowRequires = "ALL/ANY" or something that highlights the checks are what the condition applies to.

Thoughts?

@rgmz rgmz force-pushed the experiment/multiple-allowlist branch from 9f26aa7 to fd084de Compare October 13, 2024 02:34
@rgmz rgmz changed the title Experiment: define multiple allowlists per rule Define multiple allowlists per rule Oct 13, 2024
@rgmz rgmz force-pushed the experiment/multiple-allowlist branch from fd084de to b9f7fda Compare October 13, 2024 17:39
@zricethezav
Copy link
Collaborator

At first blush it wasn't 100% clear to me that
[[rules.allowlists]]
condition = "AND" # default condition is "OR".
applied to the different checks in this allowlist and not across the different allowlists associated with the rule. I don't know that I have a great suggestion at this moment. It wasn't hard to figure out after looking closer, but just trying to think of something that someone could catch at a glance if they weren't familiar with the behavior.

We can make it extra clear in the readme that the condition field is for checks in the singular allowlist.

@rgmz rgmz force-pushed the experiment/multiple-allowlist branch from b9f7fda to 0862fcd Compare October 14, 2024 14:56
@rgmz
Copy link
Contributor Author

rgmz commented Oct 14, 2024

We can make it extra clear in the readme that the condition field is for checks in the singular allowlist.

Do you have anything in mind for wording?

@rgmz rgmz force-pushed the experiment/multiple-allowlist branch 3 times, most recently from a461259 to 68d8d5a Compare October 14, 2024 15:26
@rgmz
Copy link
Contributor Author

rgmz commented Oct 14, 2024

I discovered an 'interesting' interaction while testing. While it's outside the scope of this PR, I figured it was worth mentioning.

Generic matches are ignored by the filter method if they were found by a specific rule.:

11:33AM TRC skipping generic-api-key finding (token="REDACTED"), github-pat rule takes precedence (REDACTED)
11:33AM TRC skipping generic-api-key finding (token = 'REDACTED'), github-pat rule takes precedence (REDACTED)
...
11:33AM INF scan completed in 3.46ms
11:33AM WRN leaks found: 2

However, if you create an allowlist entry for the rule, the secret will now be reported as a generic-api-key match. It feels suboptimal to potentially have to create two allowlist entries for rule-specific finding you want to ignore.

11:10AM TRC Skipping finding due to rule allowlist condition=OR contains-stopword=true regex-allowed=false rule-id=github-pat
11:10AM TRC skipping generic-api-key finding (token = 'REDACTED'), github-pat rule takes precedence (REDACTED)
Finding:     gh_token = 'ghp_...
Secret:      ghp_...
RuleID:      github-pat
Entropy:     4.921928
File:        ../test-secrets/test/github_ignore.txt
Line:        2
Fingerprint: ../test-secrets/test/github_ignore.txt:github-pat:2

Finding:     gh_token="ghp_..."
Secret:      ghp_...
RuleID:      generic-api-key
Entropy:     4.921928
File:        ../test-secrets/test/github_ignore.txt
Line:        1
Fingerprint: ../test-secrets/test/github_ignore.txt:generic-api-key:1

11:10AM INF scan completed in 4.56ms
11:10AM WRN leaks found: 2

@zricethezav
Copy link
Collaborator

However, if you create an allowlist entry for the rule, the secret will now be reported as a generic-api-key match. It feels suboptimal to potentially have to create two allowlist entries for rule-specific finding you want to ignore.

I agree with that. Let's make a follow up issue to track this

@rgmz rgmz force-pushed the experiment/multiple-allowlist branch from 68d8d5a to 48ba63f Compare October 14, 2024 18:07
@rgmz rgmz mentioned this pull request Oct 14, 2024
3 tasks
@zricethezav
Copy link
Collaborator

🥳 nice

@zricethezav zricethezav merged commit aabe381 into gitleaks:master Oct 15, 2024
1 check passed
@rgmz rgmz deleted the experiment/multiple-allowlist branch October 15, 2024 00:18
@rgmz rgmz mentioned this pull request Oct 15, 2024
3 tasks
@rgmz rgmz mentioned this pull request Mar 1, 2025
3 tasks
alayne222 pushed a commit to alayne222/gitleaks that referenced this pull request May 28, 2025
* feat: multiple rule allowlists

* feat(config): add allowlist 'condition'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Log allowlist exclusions at trace level
3 participants