Skip to content

Issue #13043: Make references optional for link and linkplain tags #16458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 15, 2025

Conversation

mahfouz72
Copy link
Member

@mahfouz72 mahfouz72 commented Mar 3, 2025

@mahfouz72
Copy link
Member Author

Github, generate report

Copy link
Contributor

github-actions bot commented Mar 3, 2025

@mahfouz72
Copy link
Member Author

It looks like Kleene Star is a time killer. we have a lott of them :)

Parse time increased by approximately 5.15 times.
lookahead burden increased by approximately 2.1 times
the prediction time becomes 90%


Before

1

After

2

@nrmancuso
Copy link
Member

It looks like Kleene Star is a time killer. we have a lott of them :)

It all depends on the context, when the grammar is as poorly implemented as this one, any change that makes matching rules more ambiguous will have a compounding negative impact on performance.

@mahfouz72
Copy link
Member Author

1

Comment on lines +1182 to +1190
| LINK_LITERAL (WS | NEWLINE | LEADING_ASTERISK)+ reference
(WS | NEWLINE)* ((WS | NEWLINE) description)?
| LINK_LITERAL (WS | NEWLINE | LEADING_ASTERISK)* ((WS | NEWLINE) description)?
| LINKPLAIN_LITERAL (WS | NEWLINE | LEADING_ASTERISK)+ reference
(WS | NEWLINE)* ((WS | NEWLINE) description)?
| LINKPLAIN_LITERAL (WS | NEWLINE | LEADING_ASTERISK)* ((WS | NEWLINE) description)?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My old code that uses reference? and (WS | NEWLINE | LEADING_ASTERISK)* creates an explosion of possibilities.

I tried to guide the parser and separate the rule for reference and non-reference to explore one path at a time to eliminate some ambiguity and avoid a lot of backtracking. I don't know if this is good, as this becomes very verbose

Screenshot shows that this rule separation improves performance (compare it to #16458 (comment) and #16458 (comment))
2

@nrmancuso @rnveach @romani please share your opinion does this make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense. It would be good to see how we look on memory usage while parsing some larger, nested html Javadocs. If we are hurting more for time than memory, I would suggest that we start extracting subrules to be reused where appropriate. This will help to make this grammar less buggy and ease maintenance.

Copy link
Member

@romani romani Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should optimize time, not a memory. With reasonable limits. Nowadays memory is cheeper.

@mahfouz72
Copy link
Member Author

@nrmancuso @romani @rnveach Are we good here?

((WS | NEWLINE) description)?
| LINK_LITERAL (WS | NEWLINE | LEADING_ASTERISK)+ reference
(WS | NEWLINE)* ((WS | NEWLINE) description)?
| LINK_LITERAL (WS | NEWLINE | LEADING_ASTERISK)* ((WS | NEWLINE) description)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these 2 are so similar, is there any benefit if we combine them so they aren't fully separate entries? I assume this won't really improve performance, so I am ok either way.

            | LINK_LITERAL (WS | NEWLINE | LEADING_ASTERISK)*
                ( (WS | NEWLINE | LEADING_ASTERISK)+ reference (WS | NEWLINE)* )?
                ( (WS | NEWLINE) description)?

At the least, maybe we should consider in the future making WS | NEWLINE | LEADING_ASTERISK and WS | NEWLINE its own group to somehow make this easier to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these 2 are so similar, is there any benefit if we combine them so they aren't fully separate entries?

I don't think so. I tried several options the one with 2 separate entries always leads to less parse time and IMO it is better in terms of readability.

At the least, maybe we should consider in the future making WS | NEWLINE | LEADING_ASTERISK and WS | NEWLINE its own group to somehow make this easier to read.

Yes, There should be a clean way to rewrite all this rules to improve the readability of this parse.

Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to merge.

Copy link
Member

@nrmancuso nrmancuso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nrmancuso nrmancuso merged commit 2a50408 into checkstyle:master Mar 15, 2025
113 checks passed
@mahfouz72 mahfouz72 deleted the allow-empty-ref branch May 9, 2025 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make references optional for link and linkplain tags
4 participants