Skip to content

[Performance] Constants for Python runtime should be folded #3698

@KvanTTT

Description

@KvanTTT

The problem

Consider the following grammar:

grammar Test;
e: T1 | T2 | T3;
T1: 't1';
T2: 't2';
T3: 't3';

For Python runtime the following code is generated for rule e:

def e(self):
    localctx = TestParser.EContext(self, self._ctx, self.state)
    self.enterRule(localctx, 0, self.RULE_e)
    self._la = 0 # Token type
    try:
        self.enterOuterAlt(localctx, 1)
        self.state = 0
        _la = self._input.LA(1)
        if not((((_la) & ~0x3f) == 0 and ((1 << _la) & ((1 << TestParser.T1) | (1 << TestParser.T2) | (1 << TestParser.T3))) != 0)):
            self._errHandler.recoverInline(self)
        else:
            self._errHandler.reportMatch(self)
            self.consume()
    except RecognitionException as re:
        localctx.exception = re
        self._errHandler.reportError(self, re)
        self._errHandler.recover(self, re)
    finally:
        self.exitRule()
    return localctx

The following constant expression is not folded during interpretation:

(1 << TestParser.T1) | (1 << TestParser.T2) | (1 << TestParser.T3)

In Java bytecode it's just 14. In this case these constant should be folded during generation.

How to check it?

I've created a benchmark: https://gist.github.com/KvanTTT/e3b355f7e321fe7f52e11ea1aa0ecbce#file-python_constant_folding_benchmark-py

It returns the following timings (python 3.8.5):

no_folding_const_refs_test: 1769 ns
no_folding_const_literals_test: 81 ns
folding: 84 ns

for the following methods:

def no_folding_const_refs_test():
    return (1 << C1) | (1 << C2) | (1 << C3) | (1 << C4) | (1 << C5) | (1 << C6) | \
        (1 << C7) | (1 << C8) | (1 << C9) | (1 << C10) | (1 << C11) | (1 << C12) | \
        (1 << C13) | (1 << C14) | (1 << C15) | (1 << C16)


def no_folding_const_literals_test():
    return (1 << 1) | (1 << 2) | (1 << 3) | (1 << 4) | (1 << 5) | (1 << 6) | \
        (1 << 7) | (1 << 8) | (1 << 9) | (1 << 10) | (1 << 11) | (1 << 12) | \
        (1 << 13) | (1 << 14) | (1 << 15) | (1 << 16)


def folding_test():
    return 0xFFFF

no_folding_const_refs_test works ~20 times slower than optimized versions. I vote for folding_test because it's the shortest code and it provides more shorter generated code.

Probably some other runtimes also don't perform constant folding (I know about Java and C#, they perform constant folding).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions