Skip to content

Conversation

sadigulcelik
Copy link

Closes #18295. Updates scipy.special.logsumexp calculation with more precise logsumexp calculation outlined in #18295 where possible.

Reference issue

What does this implement/fix?

Additional information

Closes scipy#18295. Updates logsum calculation with more precise logsum calculation outlined in scipy#18295 where possible.
@sadigulcelik sadigulcelik requested a review from person142 as a code owner May 5, 2023 02:46
Fix test to assert_allclose to measure error at appropriate tolerance

Co-authored-by: Matteo Raso <33975162+MatteoRaso@users.noreply.github.com>
Copy link
Member

@ev-br ev-br left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start!

I think the implementation could use some explanatory comments and readability tweaks.
Also would be good to add a couple of tests of a similar accuracy problems with higher dimensional arrays and exercise non-default axis and/or keepdims.

@ev-br ev-br added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.special needs-work Items that are pending response from the author labels May 5, 2023
@sadigulcelik sadigulcelik requested a review from ev-br May 5, 2023 20:15
Copy link
Contributor

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good. Some improvements might be possible.

# = a_max + log(m + R )
# = a_max + log(m) + log(1 + (1/m) * R)

tmp0 = (a == a_max)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tmp0 = (a == a_max)
mask = (a == a_max)

Some more meaningful name might help reading the code.

tmp = b * np.exp(a - a_max)

# sumexp for a != a_max
tmp = b * np.exp(a - a_max) * (~tmp0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tmp = b * np.exp(a - a_max) * (~tmp0)
R = b * np.exp(a - a_max) * (~tmp0)

Why not call it R like in the comment above? I know that my suggestion is not consistent as I only wanted to show the idea.

tmp = b * np.exp(a - a_max) * (~tmp0)

# sumexp for where a = a_max
tmp0 = b*tmp0.astype(float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tmp0 = b*tmp0.astype(float)
m = b * tmp0.astype(float)

To align naming with comment.
What happens if a np.float32 is passed as a, does this cast to float(64) change the behavior of the current implementation, i.e. the dtype of the returned value?


# suppress warnings about log of zero
with np.errstate(divide='ignore'):
s = np.sum(tmp, axis=axis, keepdims=keepdims)
s0 = np.sum(tmp0, axis=axis, keepdims=keepdims)
sf = s + s0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sf = s + s0
s = s0 + s1

It would be nice if the result, ie the sum of both terms, is called s as before.

sf*=sgn


precise = ((s0>0) & (s>0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From here on, it seems to me that the code could be simpler. s0 and s are either 0 or positive and s==0 does not hurt. So, only special casing for s0==0 should be sufficient.

@dschmitz89
Copy link
Contributor

Not trying to block this PR but wouldn't it make sense in the long run to implement logsumexp in cython or C so that it can be exposed via cython_special too? Sklearn carries a cython implementation, another case of duplication across the ecosystem.

@lorentzenchr
Copy link
Contributor

Not trying to block this PR but wouldn't it make sense in the long run to implement logsumexp in cython or C so that it can be exposed via cython_special too? Sklearn carries a cython implementation, another case of duplication across the ecosystem.

Note that even if it were available in scipy Cython API, scikit-learn would probably still have its own Cython implementation to make inlining possible. Same story is already true for expit. This makes some performance difference that I measured once.

@lucascolley
Copy link
Member

Hey @sadigulcelik , would you like to return to this? There are some comments above to address but it sounds like this was close.

@lucascolley lucascolley added this to the 1.14.0 milestone Mar 16, 2024
@lucascolley lucascolley changed the title BUG: fixes precision issue with logsumexp BUG: special.logsumexp: fix precision issue May 18, 2024
@ev-br
Copy link
Member

ev-br commented May 20, 2024

Needs a rebase now. Would be great to decide on this one either way.

@tylerjereddy
Copy link
Contributor

This PR hasn't seen commit activity for more than a year, so I think bumping the milestone is the right call. May need a new champion if the original author is busy.

@lucascolley
Copy link
Member

It looks like this PR will be superseded by gh-21597. Thanks for this anyway @sadigulcelik!

@lucascolley lucascolley removed this from the 1.15.0 milestone Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected needs-work Items that are pending response from the author scipy.special
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: special: Loss of precision in logsumexp
7 participants