Skip to content

Conversation

lrq3000
Copy link
Member

@lrq3000 lrq3000 commented Nov 29, 2015

Should fix #59, #100 and #103 by using relative (cpu) timing (time.process_time() for Py3.3+ or time.clock() for Py2) instead of absolute timing (time.time()). This should allow to reliably execute the unit tests that are time dependent, even if the Travis server is suddenly overloaded in the middle of a test.

Signed-off-by: Stephen L. lrq3000@gmail.com

@codecov-io
Copy link

Current coverage is 100.00%

Merging #60 into master will not affect coverage as of e72a08f

@@            master     #60   diff @@
======================================
  Files            5       5       
  Stmts          242     244     +2
  Branches        49      49       
  Methods          0       0       
======================================
+ Hit            242     244     +2
  Partial          0       0       
  Missed           0       0       

Review entire Coverage Diff as of e72a08f

Powered by Codecov. Updated on successful CI builds.

@lrq3000
Copy link
Member Author

lrq3000 commented Dec 16, 2015

This patch is complete for me. It's not perfect, but it's certainly more reliable than what we have now, so I think we can merge this until we get a better idea.

@casperdcl casperdcl self-assigned this Dec 16, 2015
@casperdcl
Copy link
Member

can you check the failing tests @lrq3000 ? I've removed cputime from the api as it looks bad and unclean.

@casperdcl casperdcl assigned lrq3000 and unassigned casperdcl Dec 17, 2015
@lrq3000
Copy link
Member Author

lrq3000 commented Dec 18, 2015

I will take a look, but why did you remove cputime? This was supposed to ensure that tqdm uses the relative timing if provided by unit tests (I know, that's not very elegant, but I have yet to find another way to do otherwise...).

@casperdcl
Copy link
Member

I removed it from the api and manually set it when unit testing :)

@lrq3000 lrq3000 force-pushed the relative-timing branch 2 times, most recently from e0f9ba7 to 096b678 Compare December 25, 2015 17:26
@lrq3000
Copy link
Member Author

lrq3000 commented Dec 25, 2015

I tried to investigate this issue for several hours now, and I cannot find the culprit. I cannot reproduce the issue locally, it seems it only happens on Travis, and using debug printing it seems that it's because of a low time resolution. As for the reasons why, I really don't know. The interesting thing is that this time resolution issue does not seem to happen when using the old way (with cputime), so it seems that somehow the time resolution can be diminished when trying to monkey patch.

So I suggest we stay with cputime. And even if I see it more as an internal parameter, it can also be useful for some users if they don't want their ETA to be dependent on other processes load.

@lrq3000
Copy link
Member Author

lrq3000 commented Dec 25, 2015

Proof that it's a time resolution issue, but only when using clock():

def cpu_timify(t):
    process_time()
    t._time = process_time
    t.start_t = t.last_print_t = t._time()
    print("%s | %s | %s | %s" % (repr(t.start_t), repr(process_time()), repr(time.clock()), repr(time.time()) ) )

Result:

0.19 | 0.19 | 0.19 | 1451069514.610997

The low time resolution completely put off all the tests based on rate variations.

@lrq3000
Copy link
Member Author

lrq3000 commented Dec 25, 2015

I'm trying another possible way to fix this issue. Please wait until then.

@lrq3000
Copy link
Member Author

lrq3000 commented Dec 26, 2015

I hereby declare: NO MORE OF THESE FREAKING TIMING ISSUES! (sorry about the caps, but these bugs were getting on my nerves ;) ).

So, thank's to your great tqdm._time monkey patching idea, I developped this idea further and made a virtual discrete time manager, so we do not have to care anymore about timing issues in tests_tqdm.py, as long as we use DiscreteTimer.sleep() and cpu_timify(tqdm).

However, this idea is not implementable in tests_perf.py unless we find a way to monkey patch the Python interpreter to compute the whole instructions list of every Python statements in the tests and then compute the total number (which is equivalent to saying that each statement cost 1 cpu cycle, which is optimistic, but at least it would be stable). This might be possible by scanning the AST, but it's a bit too much for my current py-fu :)

Maybe todo:

  • we could simplify some tests, where we used to set tricky values for sleep and mininterval, now we can use simple, theoretically defined values, and we don't need to care about the time resolution (nor the cpu cost!) since it's managed virtually by DiscreteTimer.
  • cputime should maybe be reinstated in tqdm, not for unit testing, but for users. I think some very demanding users may find it useful to have a way to run tqdm without any interference from external softwares. I don't have a particular scenario in mind, but I just think this might happen, but we can of course wait for someone to ask about it first :) (but right now we have the code ready so why not use it!).

@lrq3000
Copy link
Member Author

lrq3000 commented Dec 26, 2015

@casperdcl What do you think about cputime? About simplifying some tests, we can maybe do that later, the tests run fine right now, this is not a need but just cosmetic :)

Ah and I forgot to tell you that I have a weird issue since you added tqdm/*.py instead of tqdm/ in the tox.ini file:

Running command: flake8 --max-line-length=80 --count --statistics --exit-zero tq
dm/*.py
tqdm/*.py:1:1: E902 IOError: [Errno 22] invalid mode ('rU') or filename: 'tqdm/*
.py'
1       E902 IOError: [Errno 22] invalid mode ('rU') or filename: 'tqdm/*.py'
1
Running command: flake8 --max-line-length=80 --count --statistics --exit-zero ex
amples/
Running command: flake8 --max-line-length=80 --count --statistics --exit-zero *.
py
*.py:1:1: E902 IOError: [Errno 22] invalid mode ('rU') or filename: '*.py'
1       E902 IOError: [Errno 22] invalid mode ('rU') or filename: '*.py'
1
Running command: flake8 --max-line-length=80 --count --statistics --exit-zero --
ignore=E731 tqdm/tests/

(Sorry about the line wrap, damn Windows console...).

Should we revert back?

@lrq3000
Copy link
Member Author

lrq3000 commented Dec 26, 2015

Ok I'm crazy today, I'm gonna try to disassemble code on-the-fly to count the number of instructions and make a virtual timing environment for performances tests!

@lrq3000
Copy link
Member Author

lrq3000 commented Dec 26, 2015

Opcodes comparison is not yet ready, do not merge! The call is not recursive. I'm gonna try to fix that.

@casperdcl
Copy link
Member

whoa nice work Stephen. I didn't quite understand why this doesn't work for perf tests...

@casperdcl
Copy link
Member

Re: flake8 on windows, maybe open a new issue? I had changed to wildcards to stop recursive addition of directories. The new method requires less specification of individual file names (the old version would also fail by e.g. including non-timing tests in timed ones or vice-versa I think).

@lrq3000
Copy link
Member Author

lrq3000 commented Dec 26, 2015

@casperdcl Ok about opening an issue for flake8 on Windows, I'll do that when this PR will be merged.

About DiscreteTimer not working for tests_perf.py, this is because this timer is only creating a virtual environment timer, so this works for tests_tqdm.py where we can trick tqdm to get false time information since we just care about what tqdm does of this information to compute rate, adjust internal parameters, etc. But for tests_perf.py, it's the other way around: we want to watch the time consumed by each step inside tqdm, we don't just want to send false time information, we also want to receive timing info. So this is quite a different issue.

For the moment, I have two main leads, all involving analysis of the bytecode opcodes:

  • Using a recursive dis (see here, here and here and my own enhanced version here). This works, but the problem is that it computes the static bytecode, so the whole tqdm sourcecode will be accounted and every branches of the if statements, so this wouldn't really help much (the result will be an unrealistically high number of opcodes, we have lots of conditional statements but they are of course not all executed).
  • Using a bytecode tracer. This one is tough, the idea would basically be to do just like a statement coverage by running a function and trace afterwards only the bytecode that was executed. This one is tough, but I have some leads here and here and the best lead here.
  • Another possibility if we want to go real crazy: use a virtual Python bytecode runner. We would then have full control on what happens inside.

@lrq3000
Copy link
Member Author

lrq3000 commented Dec 26, 2015

For later: try to use with block statement style to inline testing code (avoid putting the code to trace inside functions), see here.

@lrq3000
Copy link
Member Author

lrq3000 commented Dec 27, 2015

[Redacted]

@lrq3000 lrq3000 added the help wanted 🙏 We need you (discussion or implementation) label Dec 27, 2015
@casperdcl casperdcl force-pushed the master branch 2 times, most recently from 7c9fe37 to 5af1135 Compare December 27, 2015 17:11
@lrq3000
Copy link
Member Author

lrq3000 commented Dec 29, 2015

I think it's time to merge this PR. We have a much better framework for time testing and perf testing, I think we should now merge into the main branch.

I removed the bytecode opcode dynamic analysis and moved it into a new branch and PR #90, we'll see what we can do with that in the future.

To summarize what has been done so far with this PR:

In tests_tqdm.py:

  • Virtual discrete timer for all tests requiring time. Two benefits: always return the same result without any influence from the system's load, and it's very fast since we don't really sleep.

In tests_perf.py:

  • Relative timing (cputime), so these tests should not be influenced by the system's load.
  • Automatic detection of cputime availability: tests whether process_time() or clock() really returns a process-wide (cpu) time and not a system-wide time (the check is done using a sleep(), if cputime is available, then the time slept should not increase the cputime).
  • Harder tests: we test not only when tqdm is used with default arguments, but also with maximum resource consumption (ie, mininterval=0, miniters=1, etc. so that tqdm needs to refresh the progress bar at each iteration).
  • MockFileNoWrite class to avoid biasing the perf tests with I/O.
  • Tests against a simple progress bar, this allows to objectively test the overhead against the most optimized bar (which of course has with fewer features than tqdm).

@lrq3000
Copy link
Member Author

lrq3000 commented Dec 29, 2015

@casperdcl and sorry about the messy history, please rebase as you want :) (ie, you can remove all commits related to bytecode analysis, and squash a few others together).

@casperdcl
Copy link
Member

@lrq3000 np messy histories are fine if they're just in branches :) will look at this later

@lrq3000
Copy link
Member Author

lrq3000 commented Dec 29, 2015

@casperdcl yes lol I noticed how great it is to use branches along a protected master. I think I'll do that with all my projects from now on!

@lrq3000 lrq3000 removed the help wanted 🙏 We need you (discussion or implementation) label Dec 29, 2015
Signed-off-by: Stephen L. <lrq3000@gmail.com>
…mock StringIO object + add hard perf tests

Signed-off-by: Stephen L. <lrq3000@gmail.com>
@lrq3000 lrq3000 force-pushed the relative-timing branch 2 times, most recently from a424b1c to 20775cd Compare January 5, 2016 20:22
…tly works, else skip + Fix unit tests

Signed-off-by: Stephen L. <lrq3000@gmail.com>
@lrq3000
Copy link
Member Author

lrq3000 commented Jan 5, 2016

I rebased myself, since I better know the code changes of this PR. This PR should now be clean and ready for merge (I checked the changes commit by commit, useless code was removed such as the bytecode tracer).

@lrq3000 lrq3000 changed the title relative timing for unit tests relative timing for unit tests (rebased) Jan 5, 2016
@lrq3000 lrq3000 mentioned this pull request Jan 6, 2016
@lrq3000
Copy link
Member Author

lrq3000 commented Jan 9, 2016

@casperdcl Should I proceed to the merging of this PR and of the IPython PR? I think they are both OK.

@lrq3000
Copy link
Member Author

lrq3000 commented Jan 9, 2016

#101

@casperdcl
Copy link
Member

@lrq3000 thanks, I'll look at them now you've finished.

@casperdcl casperdcl assigned casperdcl and unassigned lrq3000 Jan 9, 2016
@lrq3000
Copy link
Member Author

lrq3000 commented Jan 9, 2016

Ok thank's :)

@casperdcl casperdcl added this to the 3.7.0 milestone Jan 10, 2016
casperdcl added a commit that referenced this pull request Jan 12, 2016
relative timing for unit tests (rebased)
@casperdcl casperdcl merged commit 2efcd5d into master Jan 12, 2016
@casperdcl casperdcl deleted the relative-timing branch January 12, 2016 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consistent performance tests
3 participants