Skip to content

Conversation

KvanTTT
Copy link
Member

@KvanTTT KvanTTT commented Jan 29, 2022

Using plain chars for serialized ATN data instead of escaped where it's possible. Fixed the bug with the max size of ATN segment introduced in #3438.

I've compared the size of source files for largeLexer test:

Language Before (KB) After (KB) Comment
Java 1331 902
C# 2713 1997
Dart 2582 1468 + Fixed max ATN segment
JavaScript 2148 1702
Go - - No change because of plain ushort16
PHP 2287 971 + Fixed max ATN segment
Python 1711 1102
Cpp 3158 1212 + Fixed max ATN segment
Swift 4900 1400 JSON -> Binary serialization (6900 -> 3400 for compilied files)

Also, I've fixed binary serialization in Swift and replaced JSON serialization.

Please do not squash since all commits are atomic.

@KvanTTT
Copy link
Member Author

KvanTTT commented Jan 29, 2022

I don't exactly why CI reports not found deserializeFromJson for Swift, I have green tests on my machine. Could it be related to some caching? Moreover, I can't find any substring deserializeFromJson in the entire project.

@KvanTTT
Copy link
Member Author

KvanTTT commented Jan 29, 2022

@parrt please take a look at this optimization, it does not break anything.

@parrt
Copy link
Member

parrt commented Jan 29, 2022

looks like there's some very good stuff in here. My brain is stuck on the other thing at the moment so let me finish up there.

@parrt
Copy link
Member

parrt commented Jan 29, 2022

Moreover, I can't find any substring deserializeFromJson in the entire project.

I see:

Screen Shot 2022-01-29 at 3 16 41 PM

@parrt
Copy link
Member

parrt commented Jan 29, 2022

Let's resolve #3515 quickly before we resolve this one: #3515

@KvanTTT
Copy link
Member Author

KvanTTT commented Jan 29, 2022

I see:

Yes, but it's in the master branch, tests are being failed for the branch of the pull request.

Let's resolve #3515 quickly before we resolve this one: #3515

It looks like you mixed up something because you pointed out the same issues.

@parrt
Copy link
Member

parrt commented Jan 29, 2022

Sorry. Let's resolve #3515 quickly before we resolve this PR.

@parrt
Copy link
Member

parrt commented Jan 30, 2022

Copying comment over here... "Actually, concerning this PR, why not use the binary string in the parser file like others? Now Swift build is much more complicated, isn't it?"

@KvanTTT KvanTTT force-pushed the optimize-chars-serialization-data branch 4 times, most recently from b0228ac to 1c464ae Compare February 6, 2022 21:23
Comment on lines +398 to +408
case "Java":
case "JavaScript":
case "Python2":
case "Python3":
return String.format("\\u%04x", v);
case "CSharp":
return String.format("\\x%X", v);
case "Dart":
case "PHP":
case "Swift":
return String.format("\\u{%X}", v);
Copy link
Member Author

@KvanTTT KvanTTT Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding StandardTarget eventually (we were discussing the question some time ago) to get rid of the risk of misprinting in string literal:

class StandardTarget {
    public final static String Java = "Java";
    ...
}

or add constants to each target:

class JavaTarget {
    public final static String Key = "Java";
}

Those constants are used over the repository in many places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Constants are used a lot, I can see creating such a mechanism. For now, let's focus on just the changes directly in the serialization.

@KvanTTT KvanTTT force-pushed the optimize-chars-serialization-data branch from 1c464ae to bcb4232 Compare February 7, 2022 20:13
@KvanTTT KvanTTT changed the base branch from master to dev February 8, 2022 19:23
@KvanTTT KvanTTT force-pushed the optimize-chars-serialization-data branch from bcb4232 to 488f594 Compare February 12, 2022 22:03
@KvanTTT KvanTTT mentioned this pull request Feb 17, 2022
@parrt
Copy link
Member

parrt commented Feb 19, 2022

Wow. I love it! You removed a whole bunch of code, and made things much easier to understand. Well done!!! Not only that, the generated code is significantly smaller🕺. Merging... all tests are passing

@parrt parrt added this to the 4.10 milestone Feb 19, 2022
@parrt parrt merged commit 8120227 into antlr:dev Feb 19, 2022
@KvanTTT KvanTTT deleted the optimize-chars-serialization-data branch April 8, 2022 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants