-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Remove support for *args in pandas wrappers #299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@aplavin @CrazyPython not sure what you wanted to achieve with this. seems like once all the commits from #244 were squashed there's just the one change (drop support of |
Codecov Report
@@ Coverage Diff @@
## master #299 +/- ##
=========================================
Coverage ? 91.02%
=========================================
Files ? 7
Lines ? 546
Branches ? 100
=========================================
Hits ? 497
Misses ? 48
Partials ? 1 |
axis = kwargs.get('axis', 0) | ||
total = df.shape[axis] | ||
if isinstance(df, DataFrame): | ||
total += 1 # pandas calls update once too many |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casperdcl Look at this part of the code, this is to fix applymap and other commands that provide a wrong len(df)
, in these cases it's necessary to use shape[axis]
but the axis is difficult to get: it can either be a positional argument or a named argument. The problem with the positional argument is that depending on the pandas method, it's not the same position (normally it should be args[0]
but for example apply()
's first argument is a boolean, not the axis, so it would fail.
My suggestion was to drop args altogether but output a tqdm warning if it detects a call with positional arguments, and to keep Aplavin's length detection code because it might work better (but i didn't try on everything, I just ran the unit test. Maybe we should add a new unit test for this, to test length detection for every kind of methods and scenario?).
Also you can read the hidden discussions in #244 (because they are on old commits).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to be concise: TODO:
- Add tqdm warning message
if len(args) > 0
- Maybe add unit tests for series and dataframe length detection (according to axis and method used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to get someone who develops similar things in pandas to take a look. Don't really have the time to investigate seeing as I don't even use this feature.
Ok I agree, then let's keep this PR open until we get someone competent to 2016-10-29 23:36 GMT+02:00 Casper da Costa-Luis notifications@github.com:
|
@casperdcl huh? |
Casper does not frequently use pandas, he already said so. He just 2016-10-30 0:19 GMT+02:00 CrazyPython notifications@github.com:
|
add help-wanted. @aplavin review this |
@CrazyPython well, this is my patch - how can I review it? :) To best of my knowledge it's correct, but I'm not a pandas developer either. |
Thank's @aplavin. Well, then we either should just merge it in and see if it fixes issues (or if someone reports incorrect behavior), or add additional unit tests (to check correct length detection on various pandas datatypes and axis argument, if that's possible). |
8cade97
to
a65e347
Compare
this fails unit tests on my machine
|
Ok I'll have a look, the tests passed last time I tried. 2016-11-12 17:21 GMT+01:00 Casper da Costa-Luis notifications@github.com:
|
fe4a59a
to
b45e5b4
Compare
@casperdcl Ah sorry I confused this PR with #271. I have no idea why this fails, but probably because of the different axis handling. We should put this PR aside until someone with more experience can provide a better fix... |
from #322: # Precompute total iterations
if kwargs.get('axis', 0)==1:
total = len(df)
else:
total = getattr(df, 'ngroups', None)
if total is None: # not grouped
total = len(df) if isinstance(df, Series) \
else df.size // len(df)
else:
total += 1 # pandas calls update once too many |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should merge this (see #322). We will tweak this later when someone with more knowledge come by (at least we will get relevant issues!).
Fix total computation for pandas apply
Just rebased onto master. But CI now failing. |
any updates on this? |
well if anyone (@cancan101?) would like to fix the conflicts and unit tests, we'd be happy to merge this in... |
6ec00f1
to
4b6476a
Compare
@"People with write access", can anyone please review and merge the changes. 🙏 |
I am using pandas frequently. Maybe I could help. |
thanks @chengs |
even if we exclude args, still there is a problem to get a correct "total". pandas apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. so, sometimes apply_func will run one more time, but maybe not if the func is slow. still trying to find a way to determine if total = total + 1 or not. |
please check #524 @casperdcl @lrq3000 |
Fix total computation for pandas apply
migrated from #244, and rebased