Skip to content

Conversation

lucascolley
Copy link
Member

Reference issue

Towards gh-20544

What does this implement/fix?

Additional information

Seems quite simple!

@github-actions github-actions bot added scipy.stats enhancement A new feature or improvement labels Jul 2, 2024
@lucascolley lucascolley requested a review from mdhaber July 2, 2024 11:31
@lucascolley lucascolley added this to the 1.15.0 milestone Jul 2, 2024
else:
# Transform without the constant offset 1/lmb. The offset does
# not affect the variance, and the subtraction of the offset can
# lead to loss of precision.
# Division by lmb can be factored out to enhance numerical stability.
logx = lmb * logdata
# convert to `np` for `special.logsumexp`
logx = np.asarray(logx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can merge this but I'll add to the list that we need to come back after special.logsumexp is done.


# Compute the variance of the transformed data.
if lmb == 0:
logvar = np.log(np.var(logdata, axis=0))
logvar = xp.log(xp.var(logdata, axis=0))
Copy link
Contributor

@mdhaber mdhaber Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, we'll want to add axis support. Would you be willing to do that as a follow-up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try!

class TestBoxcox_llf:

def test_basic(self):
def test_basic(self, xp):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are no tests of native dtype or float32. What do you think of parametrizing this one over those possibilities?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need some tol changes

Copy link
Contributor

@mdhaber mdhaber Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just that somewhere in the float32 calculation the result is converted to an inaccurate float64. The reference value is still float64, so the test didn't fail due to dtype. So if the dtype is not promoted accidentally and the reference value is the desired dtype, I think it will be ok.

Update: the bug is in _log_var and _log_mean introduced in gh-19604. (I don't blame us; it is a pre-NEP 50 thing.) At the time, it appeared that boxcox_llf would always return float64 regardless of input dtype, so two things were done under that assumption: conversion of the input of _log_var to complex128 regardless of input dtype, and use of np.log(len(logx)) (in both _log_var and _log_mean) which will always cause the result to be float64. However, with NEP 50 rules / NumPy 2.0 (or other backends that follow array type promotion rules), the current lmb=0 code path results in float32 output, and the code before gh-19604 would have preserved the input dtype. So assuming you don't want to work on _log_var and _log_mean in this PR, I think the way to fix this is to coerce the dtype of the output of the function to the input dtype. The conversion can be removed when NumPy 2.0 is the minimum supported version and _log_var/_log_mean functions are fixed. Or, after fixing _log_var/_log_mean, we could let pre-NEP 50 NumPy follow its own rules and make a dtype exception in the tests.

@lucascolley lucascolley added the array types Items related to array API support and input array validation (see gh-18286) label Jul 4, 2024
[skip cirrus] [skip circle]
@lucascolley lucascolley requested a review from mdhaber July 6, 2024 13:15
@mdhaber mdhaber merged commit 9e688a9 into scipy:main Jul 7, 2024
@mdhaber
Copy link
Contributor

mdhaber commented Jul 7, 2024

Thanks @lucascolley. I went ahead and merged. There is some follow-up work to be done, so I'll leave the box unchecked. I think we need to:

  1. resurrect logsumexp
  2. add array API support to _log_var and _log_mean
  3. add axis support

Interested in some of those?

@lucascolley lucascolley deleted the xp_boxcox_llf branch July 7, 2024 08:22
ev-br pushed a commit to ev-br/scipy that referenced this pull request Jul 14, 2024
* ENH: `stats.boxcox_llf`: add array API support
@lucascolley
Copy link
Member Author

@mdhaber could you give me some pointers for what axis support should look like? I'll send a PR.

@mdhaber
Copy link
Contributor

mdhaber commented Jul 22, 2024

Yeah! For the most part, it just means adding axis as a kwarg to the function and consistently passing it to the reducing functions called within (var, sum, and logsumexp). Let's carry that through _log_mean and _log_var, too. One nice thing is that the example that loops through all the values of $\lambda$ can be replaced with a single vectorized call. When the PR is up, I can suggest how to add the _axis_nan_policy decorator and add the standard tests. We won't worry about array API initially. How does that sound?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants