Skip to content

lint: add WARN for regex features that contain unescaped dot #2635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 18, 2025

Conversation

mike-hunhoff
Copy link
Collaborator

see mandiant/capa-rules#1026 for an example of how unescaped . can results in FPs. This adds a WARN for regular expressions that contain unescaped ..

@mike-hunhoff mike-hunhoff requested a review from a team March 18, 2025 20:06
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @mike-hunhoff, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces a new linting rule to detect regular expressions that contain unescaped periods. The goal is to help prevent accidental misinterpretations of regular expressions where a literal period was intended but could be interpreted as a wildcard. The linting rule warns developers about such cases, suggesting they escape the period unless the wildcard behavior is intended.

Highlights

  • New Linting Rule: A new linting rule, FeatureRegexContainsUnescapedPeriod, has been added to scripts/lint.py to check for regular expressions with unescaped periods.
  • Warning for Unescaped Periods: The new linting rule generates a warning when it finds an unescaped period in a regular expression, recommending that the period be escaped unless it's meant to be a regex dot operator.
  • Integration with Existing Linting Framework: The new linting rule is integrated into the existing linting framework by adding it to the FEATURE_LINTS tuple.

Changelog

  • scripts/lint.py
    • Imported the Regex feature from capa.features.common.
    • Added a new linting class FeatureRegexContainsUnescapedPeriod to detect unescaped periods in regular expressions.
    • The FeatureRegexContainsUnescapedPeriod class checks for unescaped periods in regex features and provides a warning with a recommendation to escape the period.
    • Added the new linting rule FeatureRegexContainsUnescapedPeriod to the FEATURE_LINTS tuple to include it in the linting process.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Trivia time!

What is the origin of the term 'regex'?

Click here for the answer
The term 'regex' is short for 'regular expression', which comes from the mathematical theory called 'regular language'. This theory was developed in the 1950s by Stephen Cole Kleene, who is considered one of the founders of theoretical computer science.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new lint check to warn about regular expressions containing unescaped periods. This is a good addition to help prevent false positives in capa rules. The implementation seems reasonable, but there are a few areas that could be improved for clarity and correctness.

Summary of Findings

  • Regex pattern extraction: The logic for extracting the regex pattern from the feature value could be simplified and made more robust.
  • Unnecessary return: The check_features method returns False early when no unescaped period is found, but this is unnecessary and can be removed.
  • Inconsistent escaping check: The logic for checking escaped periods is not consistent and may lead to missed cases or false positives.

Merge Readiness

The pull request introduces a valuable lint check. However, there are some issues that need to be addressed before merging. Specifically, the regex pattern extraction and the escaping check logic should be improved. I am unable to approve this pull request, and recommend that it not be merged until the high severity issues are addressed. Users should have others review and approve this code before merging.

@github-actions github-actions bot dismissed their stale review March 18, 2025 20:08

CHANGELOG updated or no update needed, thanks! 😄

@mike-hunhoff
Copy link
Collaborator Author

see mandiant/capa-rules#1028 for results

@williballenthin
Copy link
Collaborator

sidebar: how was the AI review? helpful or spam?

Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work

mike-hunhoff and others added 2 commits March 18, 2025 14:50
Co-authored-by: Willi Ballenthin <wballenthin@google.com>
Co-authored-by: Willi Ballenthin <wballenthin@google.com>
@mike-hunhoff
Copy link
Collaborator Author

sidebar: how was the AI review? helpful or spam?

I found the code review portion to be useful and I vote we keep it around for now. The initial feedback gets you thinking and it was 50/50 on suggestions (1 was helpful, 1 was not).

@mike-hunhoff mike-hunhoff merged commit 1572dd8 into master Mar 18, 2025
23 checks passed
@mike-hunhoff mike-hunhoff deleted the lint/check/unescaped/dot branch March 18, 2025 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants