-
Notifications
You must be signed in to change notification settings - Fork 18
Description
I just applied a local build of my spotless antlr4 formatter on antlr/grammars-v4 several times in a row and everytime the sources changed. To me It looks like the formatter is removing stuff. So I did the same using the standalone JAR to check if it is a bug in my spotless integration, but the result was the same.
Here's what I did:
- checkout https://github.com/antlr/grammars-v4
- downloaded the released standalone formatter
- move the jar into checkout directory
- apply formatter by calling
java -jar antlr4-formatter-standalone-1.1.0.jar -dir .
- commit the changes
- apply formatter again
java -jar antlr4-formatter-standalone-1.1.0.jar -dir .
- expect to have no modified files, as the sources are already formatted
But afterwards the following files where modified:
- antlr4/ANTLRv4LexerPythonTarget.g4
- antlr4/examples/twocomments.g4
- apex/apex.g4
- c/C.g4
- csharp/CSharpPreprocessorParser.g4
- csharp/CSharpSharwell/CSharpPreprocessorParser.g4
- java8-js/Java8.JavaScriptTarget.g4
- java8-ts/Java8.TypeScriptTarget.g4
- java8/Java8.g4
- java9/Java9.g4
- javascript/JavaScriptParser.g4
- php/Python/PhpLexer.g4
- plsql/PlSqlParser.g4
- python2-js/Python2.g4
- python2/Python2.g4
- python3-py/Python3.g4
- rexx/RexxLexer.g4
- swift3/Swift3.g4
- system_verilog/SysVerilogHDL.g4
- tsql/TSqlParser.g4
- ucb-logo/UCBLogo.g4
- z/ZLexer.g4
Taking a look at java8/Java8.g4
the first formatting removes some comments (which I think it shouldn't)
Shortened Diff:
// §3.10.6 Escape Sequences for Character and String Literals
-fragment
-EscapeSequence
- : '\\' [btnfr"'\\]
- | OctalEscape
- | UnicodeEscape // This is not in the spec but prevents having to preprocess the input
- ;
-
-fragment
-OctalEscape
- : '\\' OctalDigit
- | '\\' OctalDigit OctalDigit
- | '\\' ZeroToThree OctalDigit OctalDigit
- ;
-
-fragment
-ZeroToThree
- : [0-3]
- ;
+fragment EscapeSequence
+ : '\\' [btnfr"'\\] | OctalEscape | UnicodeEscape
+ ;
+
+
+fragment OctalEscape
+ : '\\' OctalDigit | '\\' OctalDigit OctalDigit | '\\' ZeroToThree OctalDigit OctalDigit
+ ;
+
+
+fragment ZeroToThree
+ : [0-3]
+ ;
but the second formatting removes parts of the grammar:
Complete Diff:
diff --git a/java8/Java8.g4 b/java8/Java8.g4
index 0596456..e16ba95 100644
--- a/java8/Java8.g4
+++ b/java8/Java8.g4
@@ -2047,20 +2047,13 @@ fragment JavaLetter
// covers all characters above 0x7F which are not a surrogate ~ [\u0000-\u007F\uD800-\uDBFF]{Character.isJavaIdentifierStart(_input.LA(-1))}?|
// covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF[\uD800-\uDBFF][\uDC00-\uDFFF]{Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?;
-
fragment JavaLetterOrDigit
: [a-zA-Z0-9$_] |
// covers all characters above 0x7F which are not a surrogate ~ [\u0000-\u007F\uD800-\uDBFF]{Character.isJavaIdentifierPart(_input.LA(-1))}?|
// covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF[\uD800-\uDBFF][\uDC00-\uDFFF]{Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?;
-
//
// Additional symbols not defined in the lexical specification
- //
-
- AT
- : '@'
- ;
-
+ // AT
ELLIPSIS
: '...'
I think we should extract the non working grammar parts and write unit tests for each. This way we are testing Antlr4ParseTreeListenerImpl.java
which is anyway very complex and could use some refactoring afterwards.