Skip to content
This repository was archived by the owner on Apr 24, 2025. It is now read-only.
This repository was archived by the owner on Apr 24, 2025. It is now read-only.

Make character classes atomic with flag i #350

@slevithan

Description

@slevithan

As discussed in #290, character classes with flag i can randomly (from the user's perspective) lead to performance problems, as a result of other seemingly-unrelated things (again, from the user's perspective) like whether the character class is repeated with a quantifier. This is because of how some characters are case-folded to multiple characters in Unicode, and as a result, choices for elements other than the character class are generated after the character class.

All of the things that come together to trigger the resulting performance problem (flag i, character classes, sets like \S that contain affected characters, and quantifiers) are extremely commonly used, so this seems like it could be a relatively common issue.

But it also seems like there is a simple solution: Make character classes atomic. This is already done for \R and \X, which are very similar in concept.

This would of course be a breaking change in some edge cases. But since it would avoid a potentially serious issue, maybe that's worth it?

Note: My preferred solution would not be to make character classes atomic, but rather to remove the current case expansion to multiple characters, as discussed in #351. If that behavior was moved behind its own flag, I'm not sure whether it would still make sense to make character classes atomic, since, despite the hazards, backtracking for character classes does result in more intuitive match results if the class contains multi-character elements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions