Skip to content

Tests for regular expression modifiers #3756

@cjtenny

Description

@cjtenny

Explainer: https://github.com/tc39/proposal-regexp-modifiers
Spec text: https://tc39.es/proposal-regexp-modifiers/

regexp-modifiers testing plan

  • Syntax errors: Throw both in parsing and when constructed with new Regexp("...")
    • Basic regular expression flags (n.b. source text refers to matched text for "regular expression flags" production of grammar)
      • Source text contains other code points than i, m, s
      • Source text contains combining codepoints alongside i, m, s
      • Source text contains other non-display codepoints alongside i, m, s
      • Source text contains ZWNJ, ZWJ, ZWNBSP alongside i, m, s (I think this is right? https://tc39.es/ecma262/#sec-unicode-format-control-characters)
      • Source text contains i, m, and/or s more than once
      • Source text in a case-ignoring context contains code points that case fold to i, m, s e.g. I, M, S
      • Source text contains code points outside the basic latin range that, were they canonicalized by a unicode-mapping regex, would map to e.g. i, m, or s (e.g. ſ (U+017F) would map to s, U+0130 to i) (ref. https://www.unicode.org/Public/12.1.0/ucd/CaseFolding.txt)
        • (e.g. /foo(?\u{017F}:bar)/u is a syntax error, /foo(?s:bar)/u is not)
    • Arithmetic regular expression flags
      • First or second source text exhibits any of the 'basic regular expression flags' errors
      • Both source texts are empty
      • Code point matched by first flags is also contained in source text matched by second flags
      • Various forms of (?ims-ims) - no colon - is a syntax error
      • Source text cannot use unicode escape sequences to express code points i, m, s
  • Valid syntax
    • Basic regular expression flags parse correctly
    • Source text with any valid combination of flags or arithmetic flags - reasonable to enumerate
  • Behavior
    • Disabling flag in subexpression behaves correctly when corresponding top-level flag is and isn't already set
    • Enabling flag in subexpression behaves correctly when corresponding top-level flag is and isn't already set
    • Constructing a RegExp from a literal but changing flags by an argument to the RegExp constructor does (or does not) correctly change behavior of a subexpression that enables or removes flags.
    • i
      • Ignore case applies appropriately inside subexpression, but not outside; when turned on, off, and when nested inside a subexpression that has previously modified behavior
      • Behavior as normal when other flags modified but i flag not modified
      • Callers of Canonicalize:
        • Backreferences ignore case in captures
        • Individual characters ignore case
        • Character sets ignore case
        • Character escapes ignore case
        • Character class escapes ignore case
        • \w class, \b, \B all ignore case
    • m
      • ^ and $ apply appropriately inside subexpression, but not outside; when turned on, off, and when nested inside a subexpression that has previously modified behavior
    • s
      • . applies appropriately inside subexpression, but not outside; when turned on, off, and when nested inside a subexpression that has previously modified behavior
    • Subexpressions with flags set do not cause RegExp()....flags or /.../.flags to have the flags set, e.g. (new RegExp("(?i:a)")).flags does not include i.
    • ^ for RegExp.prototype.dotAll, .multiline, ignoreCase

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions