Skip to content
This repository was archived by the owner on Apr 24, 2025. It is now read-only.
This repository was archived by the owner on Apr 24, 2025. It is now read-only.

Pattern-terminating \x and \u are handled as identity escapes #343

@slevithan

Description

@slevithan
  • If used at the very end of a pattern, \x matches a literal x rather than a ␀ character like it does at other positions.
  • If used at the very end of a pattern, \u matches a literal u rather than throwing an error like it does at other positions.

These feel like bugs, since I'm not sure why they should behave differently only at the very end of a pattern.

Disclosure: I'm testing using Oniguruma 6.9.8 via vscode-oniguruma. However, the release notes for subsequent versions don't mention any related changes.


More details:

Current behavior for \x: It's an escape for the ␀ character (equivalent to \0, \x00, etc.) if it's not followed by { or a hexadecimal digit.

  • \x is an error if followed by a { that's followed by a hexadecimal digit but doesn't form a valid \x{…} code point escape. Ex: \x{F and \x{0,2} are errors.
  • \x is an identity escape (matching a literal x) if followed by a { that isn't followed by a hexadecimal digit. Ex: \x{ matches x{, \x{G matches x{G, \x{} matches x{}, and \x{,2} matches 0–2 x characters, since {,2} is a quantifier with an implicit 0 min.
  • \x is treated as an identity escape (matching a literal x) if it appears at the very end of a pattern.
    • This feels like a bug.

Current behavior for \u:

  • Normally, any incomplete \uHHHH (including bare \u) throws an error.
  • \u is treated as an identity escape (matching a literal u) if it appears at the very end of a pattern.
    • This feels like a bug.

Note: PCRE2 v10.45 recently removed the ability for \x to match a ␀ character, and recommends using \x00 instead.

* (Minor pattern syntax change) Parsing of the \x escape is stricter, and is
no longer parsed as an escape for the NUL character if not followed by '{' or
a hexadecimal digit. Use \x00 instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions