Skip to content

Conversation

casperdcl
Copy link
Member

@casperdcl casperdcl commented Oct 29, 2016

Fix total computation for pandas apply

migrated from #244, and rebased

@casperdcl casperdcl added p0-bug-critical ☢ Exception rasing to-review 🔍 Awaiting final confirmation labels Oct 29, 2016
@casperdcl
Copy link
Member Author

@aplavin @CrazyPython not sure what you wanted to achieve with this. seems like once all the commits from #244 were squashed there's just the one change (drop support of *args). did i miss something?

@codecov-io
Copy link

codecov-io commented Oct 29, 2016

Codecov Report

❗ No coverage uploaded for pull request base (master@a379e33). Click here to learn what that means.
The diff coverage is 0%.

@@            Coverage Diff            @@
##             master     #299   +/-   ##
=========================================
  Coverage          ?   91.02%           
=========================================
  Files             ?        7           
  Lines             ?      546           
  Branches          ?      100           
=========================================
  Hits              ?      497           
  Misses            ?       48           
  Partials          ?        1

axis = kwargs.get('axis', 0)
total = df.shape[axis]
if isinstance(df, DataFrame):
total += 1 # pandas calls update once too many
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperdcl Look at this part of the code, this is to fix applymap and other commands that provide a wrong len(df), in these cases it's necessary to use shape[axis] but the axis is difficult to get: it can either be a positional argument or a named argument. The problem with the positional argument is that depending on the pandas method, it's not the same position (normally it should be args[0] but for example apply()'s first argument is a boolean, not the axis, so it would fail.

My suggestion was to drop args altogether but output a tqdm warning if it detects a call with positional arguments, and to keep Aplavin's length detection code because it might work better (but i didn't try on everything, I just ran the unit test. Maybe we should add a new unit test for this, to test length detection for every kind of methods and scenario?).

Also you can read the hidden discussions in #244 (because they are on old commits).

Copy link
Member

@lrq3000 lrq3000 Oct 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to be concise: TODO:

  • Add tqdm warning message if len(args) > 0
  • Maybe add unit tests for series and dataframe length detection (according to axis and method used).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to get someone who develops similar things in pandas to take a look. Don't really have the time to investigate seeing as I don't even use this feature.

@lrq3000
Copy link
Member

lrq3000 commented Oct 29, 2016

Ok I agree, then let's keep this PR open until we get someone competent to
review and contribute to this.

2016-10-29 23:36 GMT+02:00 Casper da Costa-Luis notifications@github.com:

@casperdcl commented on this pull request.

In tqdm/_tqdm.py #299:

             # Precompute total iterations
             total = getattr(df, 'ngroups', None)
             if total is None:  # not grouped
  •                total = len(df) if isinstance(df, Series) \
    
  •                    else df.size // len(df)
    
  •                if df_function == 'applymap':
    
  •                    total = df.size
    
  •                else:
    
  •                    axis = kwargs.get('axis', 0)
    
  •                    total = df.shape[axis]
    
  •                    if isinstance(df, DataFrame):
    
  •                        total += 1  # pandas calls update once too many
    

I think we need to get someone who develops similar things in pandas to
take a look. Don't really have the time to investigate seeing as I don't
even use this feature.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#299, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABES3pDRAhMknflKJztHAt2xg0YwxD-Gks5q47xfgaJpZM4KkJ18
.

@CrazyPython
Copy link
Contributor

@casperdcl huh?

@lrq3000
Copy link
Member

lrq3000 commented Oct 29, 2016

Casper does not frequently use pandas, he already said so. He just
implemented the tqdm_pandas submodule using only his great insights into
the pandas API. But we are reaching a state where we need someone that is
more expert than us in pandas. I used pandas for a whole project in the
past, so you can say that I am a casual user, and this is getting too
complex for me too.

2016-10-30 0:19 GMT+02:00 CrazyPython notifications@github.com:

@casperdcl https://github.com/casperdcl huh?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#299 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABES3h9dQ9LditGk54sqxePJ52a2s7oIks5q48ZsgaJpZM4KkJ18
.

@CrazyPython
Copy link
Contributor

add help-wanted.

@aplavin review this

@lrq3000 lrq3000 added the help wanted 🙏 We need you (discussion or implementation) label Oct 30, 2016
@aplavin
Copy link
Contributor

aplavin commented Oct 30, 2016

@CrazyPython well, this is my patch - how can I review it? :) To best of my knowledge it's correct, but I'm not a pandas developer either.
I myself use tqdm with this modification and it works ok.

@lrq3000
Copy link
Member

lrq3000 commented Oct 30, 2016

Thank's @aplavin. Well, then we either should just merge it in and see if it fixes issues (or if someone reports incorrect behavior), or add additional unit tests (to check correct length detection on various pandas datatypes and axis argument, if that's possible).

@casperdcl casperdcl force-pushed the master branch 4 times, most recently from 8cade97 to a65e347 Compare October 31, 2016 02:34
@casperdcl
Copy link
Member Author

this fails unit tests on my machine FAIL: Test pandas.DataFrame[.series].progress_apply in tests_pandas.py line 60:

AssertionError: 
Expected:
100% at least twice

@lrq3000
Copy link
Member

lrq3000 commented Nov 12, 2016

Ok I'll have a look, the tests passed last time I tried.

2016-11-12 17:21 GMT+01:00 Casper da Costa-Luis notifications@github.com:

this fails unit tests on my machine FAIL: Test pandas.DataFrame[.series].
progress_apply in tests_pandas.py line 60:

AssertionError:
Expected:
100% at least twice


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#299 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABES3s5w1KFok3ssO8118s9k_R3WxDtkks5q9eeAgaJpZM4KkJ18
.

@casperdcl casperdcl added this to the v5.0.0 milestone Nov 13, 2016
@lrq3000 lrq3000 added to-fix ⌛ In progress and removed to-review 🔍 Awaiting final confirmation labels Nov 14, 2016
@lrq3000
Copy link
Member

lrq3000 commented Nov 14, 2016

@casperdcl Ah sorry I confused this PR with #271. I have no idea why this fails, but probably because of the different axis handling. We should put this PR aside until someone with more experience can provide a better fix...

@lrq3000 lrq3000 removed this from the v5.0.0 milestone Nov 14, 2016
@casperdcl casperdcl added p2-bug-warning ⚠ Visual output bad and removed p0-bug-critical ☢ Exception rasing labels Dec 6, 2016
@casperdcl
Copy link
Member Author

from #322:

# Precompute total iterations
if kwargs.get('axis', 0)==1:
    total = len(df)
        else:
            total = getattr(df, 'ngroups', None)
                if total is None:  # not grouped
                    total = len(df) if isinstance(df, Series) \
                        else df.size // len(df)
                else:
                    total += 1  # pandas calls update once too many

Copy link
Member

@lrq3000 lrq3000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should merge this (see #322). We will tweak this later when someone with more knowledge come by (at least we will get relevant issues!).

aplavin and others added 2 commits May 1, 2017 18:03
Fix total computation for pandas apply
@casperdcl
Copy link
Member Author

Just rebased onto master. But CI now failing.

@casperdcl casperdcl assigned casperdcl and unassigned casperdcl May 5, 2017
@cancan101 cancan101 mentioned this pull request Jul 12, 2017
3 tasks
@cancan101
Copy link
Contributor

any updates on this?

@casperdcl
Copy link
Member Author

well if anyone (@cancan101?) would like to fix the conflicts and unit tests, we'd be happy to merge this in...

@casperdcl casperdcl force-pushed the master branch 6 times, most recently from 6ec00f1 to 4b6476a Compare July 22, 2017 14:15
@Raj-JainHC
Copy link

@"People with write access", can anyone please review and merge the changes. 🙏

@chengs
Copy link
Contributor

chengs commented Mar 9, 2018

I am using pandas frequently. Maybe I could help.

@casperdcl
Copy link
Member Author

thanks @chengs

@chengs
Copy link
Contributor

chengs commented Mar 14, 2018

even if we exclude args, still there is a problem to get a correct "total".

pandas apply calls func twice on the first column/row to decide whether it can take a fast or slow code path.

so, sometimes apply_func will run one more time, but maybe not if the func is slow.

still trying to find a way to determine if total = total + 1 or not.

@chengs
Copy link
Contributor

chengs commented Mar 14, 2018

please check #524 @casperdcl @lrq3000
I cannot commit to this #299

@casperdcl casperdcl closed this Apr 3, 2018
@casperdcl casperdcl deleted the aplavin-patch-2 branch April 3, 2018 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted 🙏 We need you (discussion or implementation) p2-bug-warning ⚠ Visual output bad to-fix ⌛ In progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants