Polynomial regression to predict time (tqdm_regress, tsrange) #248

lrq3000 · 2016-08-26T17:41:05Z

Add tqdm module to predict time using machine learning polynomial regression.

Implementation of what was asked in #206.

For memory, here is the model used in this implementation to predict time using machine learning polynomial regression:

Canonical example:

import time
from tqdm import tsrange

plot = True
tstep = 0.1

tt = tsrange(100, leave=True, miniters=1, mininterval=0, algo='numpy', order=3, plot=plot)
#tt = tsrange(100, leave=True, miniters=1, mininterval=0, algo='descent', repeat=30, plot=plot)
#tt = tsrange(100, leave=True, miniters=1, mininterval=0, algo='sklearn', order=3, repeat=30, plot=plot)

for i in tt:
    time.sleep(tstep)
    #tstep = tstep * 1.1
    tstep = tstep + 0.01

    # DEBUG
    predicted_endtime = float(tt.model.predict([tt.total]))
    tt.write('it: %i, Predicted endtime: %g vs current: %g' % (tt.n, predicted_endtime, tt.model.predict([tt.n])))

Uncomment the other tt = ... lines if you want to try the different algorithms provided.

There are three different algorithms provided:

Numpy: this is the analytical least square regression, so it always finds the best fit, but it can be slower if there are lots of samples to regress. Plus it requires numpy. This is the default, as it gives the best results currently. Plus I limited the number of samples used to fit, so the dimensionality shouldn't be an issue.
Descent: pure python implementation of minibatch gradient descent for polynomial regression. It was initially set to be the default, but the analytical approach works way better in our case.
Sklearn: use scikit-learn to compute a solution (iteratively, not analytically). It's similar to "descent" but it requires scikit-learn and should converge a bit faster and be more resilient against noisy samples (because it uses PassiveAggressive regressor by default, but it can be switched to stochastic gradient descent).

Ok so basically how it works:

At each iteration, a new sample (n=current iteration, elapsed) is generated.
This sample gets fuzzily added to a batch of memorized samples in the model. Why fuzzily? This isn't the best term to describe it, but basically new points are not necessarily memorized, there's a probability to store them or just to skip them. The goal was to ensure we remember old points as well as new points, but not just only new points. This should generate a better model. Why do we memorize a batch of samples? I tried with stochastic gradient descent, but with only one point the fitting gets it all wrong, we need several different samples to fit correctly, hence why we need to memorize some points, but not all else we will get out of memory if the iterable total > 10^7.
The model learns in batch on all memorized samples at each iteration. So it adapts itself fully at each iteration to the new shape of the curve.
Ending time is predicted using the model.
We compute the expected average rate given the predicted ending time.
We feed everything to format_meter.

Note on the implementation: all this is done in format_meter() for two reasons:

Easier and cleaner to implement, else we would need to copy the whole code of __iter__() and update().
Lighter on CPU, because we add samples/fit only when printing, which is regulated by miniters and such. So the CPU impact of the regression is greatly lowered, at the expense of missing a few points that could help converge faster.

Please try it out and tell me if it fits your need and if it works well (don't hesitate to play with the parameters, they are explained in the docstring of tqdm_regress). I will finish the TODO after received feedback.

TODO:

Flake8
Unit tests (use virtual timer and canonical cases: addition, multiplication, subtraction, etc.)
Readme

tqdm module to predict time using machine learning polynomial regression Signed-off-by: Stephen L. <lrq3000@gmail.com>

codecov-io · 2016-08-26T17:44:59Z

Codecov Report

❗ No coverage uploaded for pull request base (master@1104d07). Click here to learn what that means.

@@            Coverage Diff            @@
##             master     #248   +/-   ##
=========================================
  Coverage          ?   64.29%           
=========================================
  Files             ?        8           
  Lines             ?      759           
  Branches          ?      148           
=========================================
  Hits              ?      488           
  Misses            ?      270           
  Partials          ?        1

Impacted Files	Coverage Δ
tqdm/init.py	`100% <100%> (ø)`
tqdm/_tqdm_regress.py	`19.28% <19.28%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1104d07...808787b. Read the comment docs.

coveralls · 2016-08-26T17:45:22Z

Coverage decreased (-26.09%) to 64.683% when pulling a710e5d on regress into 4b3d916 on master.

Signed-off-by: Stephen L. <lrq3000@gmail.com>

coveralls · 2016-08-26T17:53:50Z

Coverage decreased (-26.09%) to 64.683% when pulling 95d001b on regress into 4b3d916 on master.

Signed-off-by: Stephen L. <lrq3000@gmail.com>

coveralls · 2016-08-26T18:11:40Z

Coverage decreased (-26.3%) to 64.427% when pulling 808787b on regress into 4b3d916 on master.

lrq3000 · 2016-08-26T20:56:37Z

PS: if you are wondering whether we could implement analytical regression with numpy, the answer is: not by ourselves, way too much complicated because we would have to reimplement a lot of linear algebra on matrices. There is however a pure python library called PyLA that did all that and even features a least square linear regression function (a solve function but it's basically the same), so we could use that, but we would still rely on a third-party library (although pure python) and it would be a lot slower than numpy.

lrq3000 · 2016-08-27T21:06:56Z

Ah maybe we could implement an additional algorithm (Moving average regression model) as pointed here: #48 (comment)

CrazyPython · 2016-08-28T20:56:32Z

@lrq3000 By default the "regression mode" is linear. This one implements polynomial - perhaps exponential should also be implemented?

lrq3000 · 2016-08-28T22:31:08Z

@CrazyPython Exponential is implemented, just use order=1.5 and it will compute linear, exp, sqrt and log (yes because log can always be useful in machine learning ;) ). You can also set linear mode by setting order=1, but from my tests it's not very useful to predict time (just a little better sometimes than our current implementation but still way off in exponentially increasing cases).

CrazyPython · 2016-08-28T23:06:04Z

@lrq3000 oh linear = regular tqdm. put it in README?

lrq3000 · 2016-08-29T08:40:19Z

@CrazyPython Not exactly, normal tqdm is exponential moving average, so it cannot predict at all when iterations are taking longer and longer because it assumes constance. Linear regression is with tqdm_regress(order=1), it can predict a bit more in this scenario but generally the result is not a lot better than core tqdm. The prediction becomes more interesting with order=2 and the best i think with order=3. I tried exp, sqrt and log but they usually do not help much and generally tend to overfit.

CrazyPython · 2016-08-29T19:10:36Z

Tag help-wanted for unit tests.

lrq3000 · 2016-09-02T23:32:36Z

@CrazyPython No need, I can make them, I can recycle my testing code into unit tests + already done unit tests for machine learning libraries, so it won't be too hard, but it's time consuming, so I'll do that only if this module has an audience.

CrazyPython · 2016-10-04T00:57:03Z

@lrq3000 aye, is linting really so hard that you have to put it on the TODO? My editor (pcyharm) has it built-in w/ a simple keyboard shortcut.

lrq3000 · 2016-10-04T06:57:42Z

I don't use an IDE but a simple text editor notepad++, and my default font
is not monospace (didn't have the time to fix that). My main pb is with
spaces usually.
Le 4 Oct. 2016 02:57, "CrazyPython" notifications@github.com a écrit :

@lrq3000 https://github.com/lrq3000 aye, is linting really so hard that
you have to put it on the TODO? My editor (pcyharm) has it built-in w/ a
simple keyboard shortcut.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#248 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABES3lHrVz4GiAIOhHyYgpD9CTir3zsBks5qwaRfgaJpZM4JuTZH
.

CrazyPython · 2016-10-07T00:51:12Z

@lrq3000 Oh god you have no idea the joys of PyCharm. I used to use IDLE (notepad + indentation + shortcuts for running and embedded interactive shell), and looking back, it was pretty horrible.

PyCharm is a really "smart" IDE. If you use a __future__ statement, it'll ask you if you want to enable code compatibility inspection. There are two-click operations for converting to list comprehension and switching between argument types. They are easy to access to. PEP 8 violations are highlighted inline, and so are type violations. Ctrl+Shift+L lints your code, applying necessary changes.

PyCharm Community Edition is free and open source. Inspections can be disabled/enabled at will. Easy to start using.

casperdcl · 2019-01-24T20:50:55Z

fixes #206, should go into a contrib submodule as per #198 (comment)

add tqdm_regress and tsrange

a710e5d

tqdm module to predict time using machine learning polynomial regression Signed-off-by: Stephen L. <lrq3000@gmail.com>

lrq3000 mentioned this pull request Aug 26, 2016

Regression prediction mode #206

Open

Fix docstring

95d001b

Signed-off-by: Stephen L. <lrq3000@gmail.com>

Add no randomness option (for unit test)

808787b

Signed-off-by: Stephen L. <lrq3000@gmail.com>

lrq3000 changed the title ~~Add tqdm_regress and tsrange~~ Polynomial regression to predict time (tqdm_regress, tsrange) Aug 27, 2016

lrq3000 added the need-feedback 📢 We need your response (question) label Aug 28, 2016

lrq3000 added the submodule ⊂ Periphery/subclasses label Sep 2, 2016

lrq3000 mentioned this pull request Sep 6, 2016

Pythonic submodules architecture #198

Closed

4 tasks

lrq3000 mentioned this pull request Sep 17, 2016

Fix coverage of pandas on Travis #272

Closed

casperdcl force-pushed the master branch 4 times, most recently from 8cade97 to a65e347 Compare October 31, 2016 02:34

lrq3000 added this to the >5 milestone Nov 14, 2016

casperdcl force-pushed the master branch from c9cc5ab to 5faf18b Compare December 5, 2016 01:06

casperdcl force-pushed the master branch 6 times, most recently from 6ec00f1 to 4b6476a Compare July 22, 2017 14:15

casperdcl added the p4-enhancement-future 🧨 On the back burner label Jan 24, 2019

casperdcl force-pushed the master branch from 33c50af to b8659a7 Compare February 9, 2019 03:07

casperdcl mentioned this pull request Aug 26, 2019

next release - update inheritance, multiprocessing examples, bar_format without total #800

Merged

22 tasks

casperdcl mentioned this pull request Sep 17, 2019

submodule inheritance #815

Open

5 tasks

lrq3000 requested a review from casperdcl as a code owner August 2, 2020 14:39

casperdcl force-pushed the master branch from 6574bc1 to fc69d5d Compare September 20, 2021 16:15

Uh oh!

Polynomial regression to predict time (tqdm_regress, tsrange) #248

Are you sure you want to change the base?

Polynomial regression to predict time (tqdm_regress, tsrange) #248

Uh oh!

Conversation

lrq3000 commented Aug 26, 2016

Uh oh!

codecov-io commented Aug 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coveralls commented Aug 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Aug 26, 2016

Uh oh!

coveralls commented Aug 26, 2016

Uh oh!

lrq3000 commented Aug 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lrq3000 commented Aug 27, 2016

Uh oh!

CrazyPython commented Aug 28, 2016

Uh oh!

lrq3000 commented Aug 28, 2016

Uh oh!

CrazyPython commented Aug 28, 2016

Uh oh!

lrq3000 commented Aug 29, 2016

Uh oh!

CrazyPython commented Aug 29, 2016

Uh oh!

lrq3000 commented Sep 2, 2016

Uh oh!

CrazyPython commented Oct 4, 2016

Uh oh!

lrq3000 commented Oct 4, 2016

Uh oh!

CrazyPython commented Oct 7, 2016

Uh oh!

casperdcl commented Jan 24, 2019

Uh oh!

Uh oh!

codecov-io commented Aug 26, 2016 •

edited

Loading

coveralls commented Aug 26, 2016 •

edited

Loading

lrq3000 commented Aug 26, 2016 •

edited

Loading