Skip to content

Newline characters in grammars can produce illegal escape codes when targeting Java #2281

@timmc

Description

@timmc

A grammar containing Unicode code point references for newlines (carriage return and line feed) may produce invalid Java code.

Sample grammar:

grammar Demo;
linebreak: LF
         | CR
         ;
CR : '\u000D';
LF : '\u000A';

Snippet from resulting DemoLexer.java:

	private static final String[] _LITERAL_NAMES = {
		null, "'\u000D'", "'\u000A'"
	};
	private static final String[] _SYMBOLIC_NAMES = {
		null, "CR", "LF"
	};

This will fail to compile. It turns out that since the Java compiler interprets Unicode character escapes before parsing, "\u000D" is equivalent to having a literal carriage return in the middle of a string. Instead of \u000D and \u000A, ANTLR should emit \r and \n.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions