-
-
Notifications
You must be signed in to change notification settings - Fork 144
Open
Labels
bugSomething isn't workingSomething isn't working
Description
I'm working on a lexer for a language where I'd like to have else
and else if
lexed as separate tokens, but I'm running into suprising behaviour.
In the following example you can see that else
has been lexed as Other
:
mod else_if {
use logos::Logos;
#[derive(Logos, Debug, PartialEq)]
enum Token {
#[regex(r"[ ]+", logos::skip)]
#[error]
Error,
#[token("else")]
Else,
#[token("else if")]
ElseIf,
#[regex(r"[a-z]*")]
Other,
}
#[test]
fn else_x_else_if_y() {
let mut lexer = Token::lexer("else x else if y");
// Expected: assert_eq!(lexer.next().unwrap(), Token::Else);
assert_eq!(lexer.next().unwrap(), Token::Other);
assert_eq!(lexer.next().unwrap(), Token::Other);
assert_eq!(lexer.next().unwrap(), Token::ElseIf);
assert_eq!(lexer.next().unwrap(), Token::Other);
}
}
Removing the space from else if
allows else
to be parsed as Else
:
mod else_if_2 {
use logos::Logos;
#[derive(Logos, Debug, PartialEq)]
enum Token {
#[regex(r"[ ]+", logos::skip)]
#[error]
Error,
#[token("else")]
Else,
#[token("elseif")]
ElseIf,
#[regex(r"[a-z]*")]
Other,
}
#[test]
fn else_x_else_if_y() {
let mut lexer = Token::lexer("else x elseif y");
assert_eq!(lexer.next().unwrap(), Token::Else);
assert_eq!(lexer.next().unwrap(), Token::Other);
assert_eq!(lexer.next().unwrap(), Token::ElseIf);
assert_eq!(lexer.next().unwrap(), Token::Other);
}
}
Keeping the space in else if
, but removing some of the characters from Else
causes it to be unexpectedly matched.
mod else_if_3 {
use logos::Logos;
#[derive(Logos, Debug, PartialEq)]
enum Token {
#[regex(r"[ ]+", logos::skip)]
#[error]
Error,
#[token("e")]
Else,
#[token("else if")]
ElseIf,
#[regex(r"[a-z]*")]
Other,
}
#[test]
fn else_x_else_if_y() {
let mut lexer = Token::lexer("else x else if y");
// Expected: assert_eq!(lexer.next().unwrap(), Token::Other);
assert_eq!(lexer.next().unwrap(), Token::Else);
assert_eq!(lexer.next().unwrap(), Token::Other);
assert_eq!(lexer.next().unwrap(), Token::ElseIf);
assert_eq!(lexer.next().unwrap(), Token::Other);
}
}
My understanding of the token disambiguation documentation is that the first example should work as I'd expect, with Else
and ElseIf
being matched independently, with higher priority than Other
. Do I have that wrong? And is the last example exposing a bug?
Thanks for your time and the great library!
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working