Skip to content

Conversation

kleskjr
Copy link
Contributor

@kleskjr kleskjr commented Jan 20, 2016

The circular statistic function circvar is updated to estimate circular
variance simply as one minus the mean resultant vector (1-R), instead
of using approximation to the linear variance. Thus, the returned value
is in the range [0, 1]. This definition is compatible with the
literature (e.g., Fisher, 1993; Jammalamadaka and SenGupta, 2001) and
other implementations in statistical packages (e.g., R and Matlab).

[0, 1], 0 standing for no variance, and 1 for a large variance.

References
-----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Title underline too short" on this line, and one more below.

@larsmans
Copy link
Contributor

This is not backwards compatible, right? Wouldn't adding an option be a better idea?

@kleskjr
Copy link
Contributor Author

kleskjr commented Jan 21, 2016

No, it's not compatible. The circvar method used until now, takes simply circstd squared (before the normalization). However, I haven't seen yet such a definition anywhere else. The sources I've looked in are:

All of them define the circular variance as 1-R. Only the last reference defines it as n(1-R), where n is the array size. Therefore, the circvar definition used until now might not be correct.

Is a compatibility option still needed in this case?

@kleskjr kleskjr force-pushed the circvar-correction branch 2 times, most recently from a085fe6 to e6011cb Compare January 21, 2016 22:11
@codecov-io
Copy link

@@            master   #5747   diff @@
======================================
  Files          235     235       
  Stmts        43219   43225     +6
  Branches      8143    8145     +2
  Methods          0       0       
======================================
+ Hit          33601   33605     +4
- Partial       2591    2593     +2
  Missed        7027    7027       

Review entire Coverage Diff as of 2a6daa7

Powered by Codecov. Updated on successful CI builds.

@mirca
Copy link
Contributor

mirca commented Jan 23, 2016

How compatible (conceptually) circvar and circstd are going to be then?

@larsmans larsmans added scipy.stats needs-decision Items that need further discussion before they are merged or closed labels Jan 23, 2016
@josef-pkt
Copy link
Member

I don't remember any details for how transformation of circular data works.
What looks strange to me in this change is that the variance becomes independent of the units. This is usually not the case with any other random variables. In the case of 360 degrees, it looks strange having a tiny variance, as in the changed test case.

If this is just a renormalization of the variance into radians, then this should be an option not a backwards incompatible redefinition.

@kleskjr
Copy link
Contributor Author

kleskjr commented Jan 26, 2016

mirca said:

How compatible (conceptually) circvar and circstd are going to be then?

circstd and circvar are not going to be conceptually compatible. Fisher, 1993 is explicit that circular std is not defined as circular var**0.5.

josef-pkt said:

What looks strange to me in this change is that the variance becomes independent of the units. This is usually not the case with any other random variables.

This is how circular variance is defined in the literature, with values between 0 and 1, and independent on the units.

If this is just a renormalization of the variance into radians, then this should be an option not a backwards incompatible redefinition.

It's not only a renormalization. After normalisation, the PR's circvar is approximately twice smaller than the linear variance of data with very small deviations around the mean. For big variances, however, I would expect the old and new definition to deviate even after normalisation.

I would strongly suggest to keep the new definition of circvar. However, an optional variable could set whether we expect 1-R as output or the de-normalised value ((high-low)/(2*pi))**2*(1-R). The same optional variable should also be implemented for the (de)normalisation of cirstd.

@josef-pkt
Copy link
Member

3 years ago I was working on a circular "toolbox" based initially on a matlab translation.

I used var = 1 - R, and a wrapper that converts between the circle units (degrees, hours or whatever) into radians and back.
I don't remember much of the details but was reading mainly Mardia's book for this.

AFAIR (!), I never managed to figure out these scipy stats function and left them to Travis. (I thought differences might be due to scaling. There are also weird things in R packages, where I wasn't quite sure where their scaling or transformation comes from.)

The circular statistic function circvar is updated to estimate circular
variance simply as one minus the mean resultant vector (1-R), instead
of using approximation to the linear variance. This definition is
compatible with the literature (e.g., Fisher, 1993; Jammalamadaka and
SenGupta, 2001) and other implementations in statistical packages (e.g.,
R and Matlab).

Optional boolean parameter ``normalize`` added to both circvar and
cirstd determines whether the returned values are independent on the
variable units, or whether the values are scaled by the range of
variable values (currently default not to break compatability).

Docstring of circvar, circstd and circmean have more accurate
description for the parameters ``high`` and ''low``.
@kleskjr kleskjr force-pushed the circvar-correction branch from e6011cb to 64f36b0 Compare January 31, 2016 22:55
@kleskjr
Copy link
Contributor Author

kleskjr commented Jan 31, 2016

A new push:

  • added an optional parameter for result normalization to circstd and circvar. Because of compatibility, the returned values are scaled by default depending on the variable range as it was until now.
  • docstrins of circmean, circstd and circvar have a more accurate description of the high and low parameters.

@ev-br
Copy link
Member

ev-br commented Feb 7, 2016

I gather there might be some use of / experience with these sorts of functions in astropy land.

@mirca
Copy link
Contributor

mirca commented Feb 7, 2016

@ev-br You are right. See #4472.

@rgommers rgommers mentioned this pull request Oct 2, 2016
@zym1010
Copy link

zym1010 commented Dec 17, 2016

+1 for this feature. Also, it might be also good to allow a weights argument.

@@ -2671,18 +2671,19 @@ def _circfuncs_common(samples, high, low):

def circmean(samples, high=2*pi, low=0, axis=None):
"""
Compute the circular mean for samples in a range.
Compute the circular mean for samples assumed to be in the
range [low to high].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolving merge conflicts here means adding all the changes to _morestats.py manually. I'm not making this chance because the changes to these headings because they need to be a one-line description.

@mdhaber
Copy link
Contributor

mdhaber commented Feb 25, 2022

@kleskjr I'd like to help you finish this up. Can you select "Allow edits from maintainers"?

@mdhaber
Copy link
Contributor

mdhaber commented Mar 5, 2022

@kleskjr Thought I'd check in again. If you could allow select "Allow edits from maintainers", I'd appreciate it!

@kleskjr
Copy link
Contributor Author

kleskjr commented Mar 6, 2022

@mdhaber, edits are allowed. Sorry for the delay!

@mdhaber
Copy link
Contributor

mdhaber commented Mar 6, 2022

Thanks @kleskjr. How important do you think the normalize option is for circvar?

In cirstd, I think I see how having low and high could be useful with and without normalize:

  • whatever your units of angle were originally (e.g. degrees), you can use low and high to convert them to the range 0 to 2 pi.
  • Since the output can be interpreted as being in units of radians, normalize gives you the option to convert back to the original units. This is supported by the literature - at least Mardia (see DOC: stats.circstd: add reference, notes, comments #15652).

The validity of normalize for circvar is less clear to me. The output is not in radians; it is in the range of 0 to 1, and we can think of it as the distance from the outside of a unit circle, right? If that's correct, it seems that the more appropriate normalization would be in terms of the radius of the original circle, but we don't have that information. I don't see support for this sort of thing in the literature, so I think it might be easier to merge this without normalize, at least in circvar.

@mdhaber
Copy link
Contributor

mdhaber commented Mar 11, 2022

@steppi I suspect that the reason this hasn't already been fixed is that others are unsure of the math. That's why I tagged you, but NP if it's not of interest.

@steppi
Copy link
Contributor

steppi commented Mar 12, 2022

@steppi I suspect that the reason this hasn't already been fixed is that others are unsure of the math. That's why I tagged you, but NP if it's not of interest.

Definitely of interest but my time is scarce now. I think I can take a look in 2-3 weeks.

@rkern
Copy link
Member

rkern commented Mar 21, 2022

I don't see support for this sort of thing in the literature, so I think it might be easier to merge this without normalize, at least in circvar.

@mdhaber I concur.

Comment on lines +3666 to +3669
normalize : boolean, optional
If True, the returned value is equal to ``sqrt(-2*log(R))`` and does
not depend on the variable units. If False (default), the returned
value is scaled by ``((high-low)/(2*pi))``.
Copy link
Contributor

@mdhaber mdhaber Mar 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the variable units" is unclear.

Suggested change
normalize : boolean, optional
If True, the returned value is equal to ``sqrt(-2*log(R))`` and does
not depend on the variable units. If False (default), the returned
value is scaled by ``((high-low)/(2*pi))``.
normalize : boolean, optional
If True, the returned value is equal to ``sqrt(-2*log(R))`` and does
not depend on the units of `samples`. If False (default), the returned
value is scaled by ``((high-low)/(2*pi))``. See Notes for details.

Let's commit this when it's ready to merge. I've checked the line lengths, so I don't think CI is needed ([skip ci]).

@mdhaber
Copy link
Contributor

mdhaber commented Mar 22, 2022

Thanks @rkern. Other than the adjustment above, this is ready from my perspective. If you and @steppi approve the PR and there are no concerns from the mailing list, I'll go ahead and merge it in a week.

Copy link
Contributor

@steppi steppi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. It's been 5 years since I've done anything with directional statistics, but I've double-checked @kleskjr's references and agree that this is the most reasonable definition of the circular variance. I also agree with not including a normalize option for cirvar.

@mdhaber
Copy link
Contributor

mdhaber commented Mar 31, 2022

There were no replies to the mailing list post, there is +1 explicit approval from @steppi (and me), and there seems to be other support above, so I'm merging. I added a note about the backward-incompatible change in the release notes. Thanks for the review @steppi!

@mdhaber mdhaber merged commit 6f3c7be into scipy:main Mar 31, 2022
patnr added a commit to patnr/scipy that referenced this pull request Apr 5, 2022
* master: (632 commits)
  Update _dual_annealing.py (scipy#15939)
  TST: stats: make `check_sample_var` two-sided (scipy#15723)
  DOC: sparse.linalg: add citations for COLAMD
  DOC: fix missing comma in conf
  ENH/MAINT: Version switcher from the sphinx theme (scipy#15380)
  MAINT: stats: remove support for `_rvs` without `size` parameter (scipy#15917)
  BUG: Handle base case for scipy.integrate.simpson when span along the axis is 0  (scipy#15824)
  MAINT: stats: adjust tolerance for failing test only
  MAINT: stats: adjust tolerance of failing TestTruncnorm
  MAINT: special: Clean up C style in ndtr.c
  CI: remove pin on Jinja2 (scipy#15895)
  STY: Remove white spaces
  MAINT: stats: exempt gilbrat from refguide_check
  MAINT: stats: rename continuous_gilbrat->continuous_gibrat
  MAINT: stats: correct name Gilbrat -> Gibrat
  [ENH] circvar calculated simply as 1-R (scipy#5747)
  DEP: deal with deprecation of ndim >1 in bspline
  MAINT: stats: more specific error type from `rv_continuous.fit` (scipy#15778)
  DOC: fix import in example in _morestats (scipy#15900)
  [BUG] make p-values consistent with the literature (scipy#15894)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-decision Items that need further discussion before they are merged or closed scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants