ScalametaTokenizer: group sequences of whitespace #3786

kitbellew · 2024-07-02T20:13:16Z

Add and use uber whitespace-token containers; these will contain contiguous sequences of horizontal whitespace, or sequences of newlines, ignoring trailing or embedded horizontal space.

scalameta/parsers/shared/src/main/scala/scala/meta/internal/parsers/ScannerTokens.scala

These will contain contiguous sequences of horizontal whitespace, or sequences of newlines, ignoring trailing or embedded horizontal space.

bjaglin · 2024-08-04T19:52:09Z

This looks like a small breaking change for consumers of the tokenizer, since consecutive whitespaces are now collapsed into composite tokens.

Scalafix rules are impacted by that change since tokens are exposed at the document level and via each element of the tree without any indirection, so instead of trying to abstract that change away, I am considering signalling it via a breaking change bump.

Which got me wondering:

In the interest of minimizing impact on Scalafix rule authors, are there at this stage any other upcoming breaking changes on the horizon?
If not, do you think there would be a way to preserve the old behavior for a bit longer on the 4.9.x line? Probably not, as the changelog mentions "Support CRLF, multiple-whitespace sequences" so I assume it's a feature, not a side effect 😃

kitbellew · 2024-08-05T05:13:06Z

This looks like a small breaking change for consumers of the tokenizer, since consecutive whitespaces are now collapsed into composite tokens.

Oh well, I didn't realize scalafix was looking at tokens as well.

In the interest of minimizing impact on Scalafix rule authors, are there at this stage any other upcoming breaking changes on the horizon?

We usually try to avoid breaking changes... that is, if we know they are breaking. But no, no plans.

If not, do you think there would be a way to preserve the old behavior for a bit longer on the 4.9.x line? Probably not, as the changelog mentions "Support CRLF, multiple-whitespace sequences" so I assume it's a feature, not a side effect 😃

@tgodzik Sometime ago, we were debating whether to add some parameters to ScalametaParser, perhaps this might be one case. I think we ended up thinking Dialect is enough, don't remember why.

tgodzik · 2024-08-05T11:55:35Z

I guess in this case a setting would make sense, though not sure how to sensibly add it. We can have a separate tokenized method maybe?

bjaglin · 2024-08-15T07:40:51Z

We can have a separate tokenized method maybe?

Any chance you could give me some pointers so that I look at that to unblock the bump on Scalafix side? I went through the tokenisation code paths, but I am not sure where to start.

tgodzik · 2024-08-15T10:43:18Z

I guess we could just add another method to ScalametaTokenizer which will take options. We can then just use that explicitely from scalafix? If needed we can expose it later via the extension methods, but I think it should not be necessary right now.

kitbellew · 2024-08-15T10:44:48Z

@bjaglin let me do it in the next day or two, and release, if this is blocking you. I'm generally in the middle of fixing some bugs that i found by attempting to use scalafmt on the dotty codebase, but can complete the fixes in the next patch.

kitbellew · 2024-08-15T10:55:07Z

does scalafix use the tokenizer directly? or through the parser?

bjaglin · 2024-08-15T16:30:26Z

does scalafix use the tokenizer directly? or through the parser?

Only through the parser.

Also, let's once again make it default, as before scalameta#3786.

Also, let's once again make it default, as before #3786.

kitbellew force-pushed the 3786 branch from 3c1113d to e01fa7d Compare July 2, 2024 20:27

kitbellew requested a review from tgodzik July 2, 2024 20:56

tgodzik requested changes Jul 3, 2024

View reviewed changes

scalameta/parsers/shared/src/main/scala/scala/meta/internal/parsers/ScannerTokens.scala Show resolved Hide resolved

kitbellew commented Jul 3, 2024

View reviewed changes

scalameta/parsers/shared/src/main/scala/scala/meta/internal/parsers/ScannerTokens.scala Show resolved Hide resolved

tgodzik approved these changes Jul 3, 2024

View reviewed changes

kitbellew added 2 commits July 3, 2024 17:56

Token: add uber whitespace-token containers

cd8adf9

These will contain contiguous sequences of horizontal whitespace, or sequences of newlines, ignoring trailing or embedded horizontal space.

ScalametaTokenizer: group sequences of whitespace

37f2fbe

kitbellew force-pushed the 3786 branch from e01fa7d to 37f2fbe Compare July 3, 2024 15:57

kitbellew merged commit 86edfac into scalameta:main Jul 3, 2024

kitbellew deleted the 3786 branch July 3, 2024 16:20

bjaglin mentioned this pull request Aug 4, 2024

Update scalameta to 4.9.9 scalacenter/scalafix#2024

Closed

This was referenced Aug 20, 2024

Test: duplicate for different TokenizerOptions #3904

Merged

Input: add WithTokenizerOptions wrapper #3906

Merged

WhitespaceTokenizer: add Granular definition #3907

Merged

bjaglin mentioned this pull request Aug 24, 2024

bump scalameta to pre-4.9.10 SNAPSHOT (was 4.9.3) scalacenter/scalafix#2047

Merged

kitbellew mentioned this pull request Aug 24, 2024

ScalametaTokenizer: check for INVALID after space #3910

Merged

kitbellew added a commit to kitbellew/scalameta that referenced this pull request Aug 24, 2024

WhitespaceTokenizer: add Granular definition

3fa92b7

Also, let's once again make it default, as before scalameta#3786.

This was referenced Aug 24, 2024

Origin: internally expose possible input #3911

Merged

Input: add overload with actual TokenizerOptions #3912

Merged

kitbellew added a commit that referenced this pull request Aug 24, 2024

WhitespaceTokenizer: add Granular definition (#3907)

fb89693

Also, let's once again make it default, as before #3786.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ScalametaTokenizer: group sequences of whitespace #3786

ScalametaTokenizer: group sequences of whitespace #3786

Uh oh!

kitbellew commented Jul 2, 2024

Uh oh!

Uh oh!

Uh oh!

bjaglin commented Aug 4, 2024

Uh oh!

kitbellew commented Aug 5, 2024

Uh oh!

tgodzik commented Aug 5, 2024

Uh oh!

bjaglin commented Aug 15, 2024

Uh oh!

tgodzik commented Aug 15, 2024

Uh oh!

kitbellew commented Aug 15, 2024

Uh oh!

kitbellew commented Aug 15, 2024

Uh oh!

bjaglin commented Aug 15, 2024

Uh oh!

Uh oh!

ScalametaTokenizer: group sequences of whitespace #3786

ScalametaTokenizer: group sequences of whitespace #3786

Uh oh!

Conversation

kitbellew commented Jul 2, 2024

Uh oh!

Uh oh!

Uh oh!

bjaglin commented Aug 4, 2024

Uh oh!

kitbellew commented Aug 5, 2024

Uh oh!

tgodzik commented Aug 5, 2024

Uh oh!

bjaglin commented Aug 15, 2024

Uh oh!

tgodzik commented Aug 15, 2024

Uh oh!

kitbellew commented Aug 15, 2024

Uh oh!

kitbellew commented Aug 15, 2024

Uh oh!

bjaglin commented Aug 15, 2024

Uh oh!

Uh oh!