Formatter removes parts of the grammar and is not idempotent

I just applied a local build of [my spotless antlr4 formatter ](https://github.com/matthiasbalke/spotless/tree/issue-326-antlr4) on [antlr/grammars-v4](https://github.com/antlr/grammars-v4) several times in a row and everytime the sources changed. To me It looks like the formatter is removing stuff. So I did the same using the standalone JAR to check if it is a bug in my spotless integration, but the result was the same.

Here's what I did:
1) checkout https://github.com/antlr/grammars-v4
1) downloaded the [released standalone formatter](https://search.maven.org/artifact/com.khubla.antlr4formatter/antlr4-formatter-standalone/1.1.0/jar)
1) move the jar  into checkout directory
1) apply formatter by calling  `java -jar antlr4-formatter-standalone-1.1.0.jar -dir .`
1) commit the changes
1) apply formatter again `java -jar antlr4-formatter-standalone-1.1.0.jar -dir .`
1) expect to have no modified files, as the sources are already formatted

But afterwards the following files where modified:
* antlr4/ANTLRv4LexerPythonTarget.g4
* antlr4/examples/twocomments.g4
* apex/apex.g4
* c/C.g4
* csharp/CSharpPreprocessorParser.g4
* csharp/CSharpSharwell/CSharpPreprocessorParser.g4
* java8-js/Java8.JavaScriptTarget.g4
* java8-ts/Java8.TypeScriptTarget.g4
* java8/Java8.g4
* java9/Java9.g4
* javascript/JavaScriptParser.g4
* php/Python/PhpLexer.g4
* plsql/PlSqlParser.g4
* python2-js/Python2.g4
* python2/Python2.g4
* python3-py/Python3.g4
* rexx/RexxLexer.g4
* swift3/Swift3.g4
* system_verilog/SysVerilogHDL.g4
* tsql/TSqlParser.g4
* ucb-logo/UCBLogo.g4
* z/ZLexer.g4

Taking a look at `java8/Java8.g4` the first formatting removes some comments (which I think it shouldn't)

**Shortened Diff:**
```patch
  // §3.10.6 Escape Sequences for Character and String Literals
 
-fragment
-EscapeSequence
-	:	'\\' [btnfr"'\\]
-	|	OctalEscape
-    |   UnicodeEscape // This is not in the spec but prevents having to preprocess the input
-	;
-
-fragment
-OctalEscape
-	:	'\\' OctalDigit
-	|	'\\' OctalDigit OctalDigit
-	|	'\\' ZeroToThree OctalDigit OctalDigit
-	;
-
-fragment
-ZeroToThree
-	:	[0-3]
-	;
+fragment EscapeSequence
+   : '\\' [btnfr"'\\] | OctalEscape | UnicodeEscape
+   ;
+
+
+fragment OctalEscape
+   : '\\' OctalDigit | '\\' OctalDigit OctalDigit | '\\' ZeroToThree OctalDigit OctalDigit
+   ;
+
+
+fragment ZeroToThree
+   : [0-3]
+   ;
```
but the second formatting removes parts of the grammar:
**Complete Diff:**
```patch
diff --git a/java8/Java8.g4 b/java8/Java8.g4
index 0596456..e16ba95 100644
--- a/java8/Java8.g4
+++ b/java8/Java8.g4
@@ -2047,20 +2047,13 @@ fragment JavaLetter
    // covers all characters above 0x7F which are not a surrogate ~ [\u0000-\u007F\uD800-\uDBFF]{Character.isJavaIdentifierStart(_input.LA(-1))}?|
    // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF[\uD800-\uDBFF][\uDC00-\uDFFF]{Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?;

-
    fragment JavaLetterOrDigit
       : [a-zA-Z0-9$_] |
       // covers all characters above 0x7F which are not a surrogate ~ [\u0000-\u007F\uD800-\uDBFF]{Character.isJavaIdentifierPart(_input.LA(-1))}?|
       // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF[\uD800-\uDBFF][\uDC00-\uDFFF]{Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?;
-
       //
       // Additional symbols not defined in the lexical specification
-      //
-
-      AT
-         : '@'
-         ;
-
+      // AT

       ELLIPSIS
          : '...'
```

I think we should extract the non working grammar parts and write unit tests for each. This way we are testing `Antlr4ParseTreeListenerImpl.java` which is anyway very complex and could use some refactoring afterwards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Formatter removes parts of the grammar and is not idempotent #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Formatter removes parts of the grammar and is not idempotent #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions