-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Issue #13043: Make references optional for link and linkplain tags #16458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #13043: Make references optional for link and linkplain tags #16458
Conversation
Github, generate report |
753d984
to
00579f4
Compare
It all depends on the context, when the grammar is as poorly implemented as this one, any change that makes matching rules more ambiguous will have a compounding negative impact on performance. |
00579f4
to
4a75878
Compare
| LINK_LITERAL (WS | NEWLINE | LEADING_ASTERISK)+ reference | ||
(WS | NEWLINE)* ((WS | NEWLINE) description)? | ||
| LINK_LITERAL (WS | NEWLINE | LEADING_ASTERISK)* ((WS | NEWLINE) description)? | ||
| LINKPLAIN_LITERAL (WS | NEWLINE | LEADING_ASTERISK)+ reference | ||
(WS | NEWLINE)* ((WS | NEWLINE) description)? | ||
| LINKPLAIN_LITERAL (WS | NEWLINE | LEADING_ASTERISK)* ((WS | NEWLINE) description)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My old code that uses reference?
and (WS | NEWLINE | LEADING_ASTERISK)*
creates an explosion of possibilities.
I tried to guide the parser and separate the rule for reference and non-reference to explore one path at a time to eliminate some ambiguity and avoid a lot of backtracking. I don't know if this is good, as this becomes very verbose
Screenshot shows that this rule separation improves performance (compare it to #16458 (comment) and #16458 (comment))
@nrmancuso @rnveach @romani please share your opinion does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense. It would be good to see how we look on memory usage while parsing some larger, nested html Javadocs. If we are hurting more for time than memory, I would suggest that we start extracting subrules to be reused where appropriate. This will help to make this grammar less buggy and ease maintenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should optimize time, not a memory. With reasonable limits. Nowadays memory is cheeper.
4a75878
to
a7db524
Compare
@nrmancuso @romani @rnveach Are we good here? |
((WS | NEWLINE) description)? | ||
| LINK_LITERAL (WS | NEWLINE | LEADING_ASTERISK)+ reference | ||
(WS | NEWLINE)* ((WS | NEWLINE) description)? | ||
| LINK_LITERAL (WS | NEWLINE | LEADING_ASTERISK)* ((WS | NEWLINE) description)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these 2 are so similar, is there any benefit if we combine them so they aren't fully separate entries? I assume this won't really improve performance, so I am ok either way.
| LINK_LITERAL (WS | NEWLINE | LEADING_ASTERISK)*
( (WS | NEWLINE | LEADING_ASTERISK)+ reference (WS | NEWLINE)* )?
( (WS | NEWLINE) description)?
At the least, maybe we should consider in the future making WS | NEWLINE | LEADING_ASTERISK
and WS | NEWLINE
its own group to somehow make this easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these 2 are so similar, is there any benefit if we combine them so they aren't fully separate entries?
I don't think so. I tried several options the one with 2 separate entries always leads to less parse time and IMO it is better in terms of readability.
At the least, maybe we should consider in the future making WS | NEWLINE | LEADING_ASTERISK and WS | NEWLINE its own group to somehow make this easier to read.
Yes, There should be a clean way to rewrite all this rules to improve the readability of this parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
closes #13043
Diff Regression config: https://gist.githubusercontent.com/mohitsatr/f5a524a06294d907c84b9fc5b68661dc/raw/604872961df92db35e66c26218dc74c61bb10097/config.xml