-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
This concerns lexer rules that contain a mix of actions and semantic predicates within one rule. It's somewhat related to #3606 in so far as when I debugged that, I then found out this problem.
Suppose we have the following grammars:
lexer:
lexer grammar TestLexer;
@lexer::members {
void initA()
{
int i = 1;
while (true)
{
var text = this.InputStream.LA(-i);
i++;
if (text == 97) count++;
else break;
}
}
int count = 0;
}
Stuff : ( 'a'+ {initA(); } | 'b') 'c' 'd' { count == 3 }? ;
parser:
parser grammar TestParser;
options { tokenVocab=TestLexer; }
start : Stuff+ EOF ;
input:
aaacd
The expectation here is that the lexer counts the number of 'a's in the input and allows a valid token with only three 'a's. Yes, it is contrived, but it illustrates something that is a deep-order assumption that I did not know, even having used Antlr for many many years.
Unfortunately, the parser does not work. The lexer ExecATN() evaluates the semantic action first, before the action is evaluated. The reason it does this is because the semantic predicates are evaluated "on the fly", while actions are queued up and evaluated at the end of the function. I don't understand why actions are queued up anyways, and people often refer to "semantic predicates" as "actions" but of a special type.
This is not "referentially transparent" because the order of evaluating the actions is not interleaved with the semantic predicates even though the action is listed in the rule RHS before the semantic predicate. The expectation normal "users" would have is that the actions and semantic predicates are evaluated in the order as they occur on the RHS of the rule.
In this example, the rule never matches, and it is impossible to parse anything.
I did a quick search in the grammars-v4 repository to see if there are rules like this. It's not an extensive search, but there is one here.