Skip to content

Conversation

DerWeh
Copy link
Contributor

@DerWeh DerWeh commented May 1, 2022

What does this implement/fix?

Vectorize the function scipy.linalg.companion to be able to treat stacks of polynomial coefficients. Probably, we need to add information to the doc-string when this feature was introduced.

Additional information

I tested the vectorization against numpy.vectorize using hypothesis.

import numpy as np
import hypothesis.extra.numpy as hnp

from hypothesis import given, assume, strategies as st
from scipy.linalg import companion


@given(a=hnp.arrays(np.float_,
                    shape=hnp.array_shapes(min_side=2),
                    elements=st.floats(1e+6, 1e+6))
       )
def test_companion_vectorization(a):
    """Test that `companion` is properly vectorized."""
    np.seterr(all='raise')
    assume(np.all(abs(a[..., 0]) > 1e-6))
    c = companion(a)
    vec_c = np.vectorize(companion, signature='(n)->(m,m)')(a)
    np.testing.assert_array_equal(c, vec_c)

This test is, however, not included as unit-test as I don't know if it is OK to use hypothesis.

@DerWeh DerWeh requested review from larsoner and ilayn as code owners May 1, 2022 21:01
@ilayn
Copy link
Member

ilayn commented May 1, 2022

I don't see the motivation for this since this is practically a for loop

>>> A = np.arange(1, 7).reshape(2,3)  # every row is a polynomial
>>> C = np.array([la.companion(x) for x in A])
array([[[-2.  , -3.  ],
        [ 1.  ,  0.  ]],

       [[-1.25, -1.5 ],
        [ 1.  ,  0.  ]]])

So I don't think anybody will have any significant performance boost from it. Also nested arrays (arrays that have array objects as elements) are very bad to work with in general. Here you can see nD-arrays are much easier to work with as you can slice them however you wish

@DerWeh
Copy link
Contributor Author

DerWeh commented May 1, 2022

Aren't all vectorizations practically for loops? It is just more readable to have it vectorized and sometimes faster, in case the python loop is the bottleneck.

I basically implemented this version to be able to drop np.vectorize from my code, as it produces messy error messages and is hard to debug when you mess up.

The example I have in mind is linear prediction (it is required to calculate the companion matrix to eliminate exponentially growing terms, see e.g., DOI: 10.1103/physrevb.92.155132.) As all matrix operations are vectorized in numpy, this function was the only one I had to manually vectorize. I can give you an example in a few days.

But I agree, it's only a nice-to-have, not something particular important.

Also nested arrays are very bad to work with in general.
I partially disagree. numpy handles nested arrays transparently, that's why we have gu-funcs. I find it also much more readable than writing many loops. Of course, there is a bit more burden for the implementation, but that's done once in numpy/scipy.

@ilayn
Copy link
Member

ilayn commented May 1, 2022

Vectorize is for applying the same function over array elements. As you can see, here you don't need a function over arrays but collecting all the arrays. gufuncs are for operating on array every element so that the implementation is done at the low-level as quick as possible. Here you are generating 2D arrays.

I find it also much more readable than writing many loops.

I basically wrote a list comprehension with no loops. Try adding, summing, multipying or doing anything with nested arrays and you will find out that the reason for nD arrays to exist is precisely that.

@DerWeh
Copy link
Contributor Author

DerWeh commented May 1, 2022

Sorry, but I do not understand your point. Of course, a single comprehension is enough to treat that specific (2-dimensional) input, as there is only a single additional axis.

In general, you need to write

Aflat = A.reshape(-1, A.shape[-1])
Cflat = np.array([companion(x) for x in Aflat])
C = Cflat.reshape(A.shape[:-1] + (A.shape[-1]-1, A.shape[-1]-1))

which I find harder to read than

C = np.companion(A)

Again, it is not necessary, but it makes the code more clear.


Furthermore, I don't really see the difference to other function like numpy.linalg.inv, numpy.matmul, numpy.trapz which are vectorized.

Think it's a bad idea, feel free to reject this pull request. I will just ship my own version, just thought this might also help others.

@DerWeh
Copy link
Contributor Author

DerWeh commented May 1, 2022

The now allowed cases have to still be removed from the test:

    def test_bad_shapes(self):
        assert_raises(ValueError, companion, [[1, 1], [2, 2]])
        assert_raises(ValueError, companion, [0, 4, 5])
        assert_raises(ValueError, companion, [1])
        assert_raises(ValueError, companion, [])

However, I don't know any assert_raises function and don't find documentation. Thus, I don't really know what these lines do.

@ev-br
Copy link
Member

ev-br commented May 1, 2022

However, I don't know any assert_raises function and don't find documentation. Thus, I don't really know what these lines do.

assert_raises is a historic alias for pytest.raises. These two forms are equivalent: assert_raises(ValueError, companion, []) and

with assert_raises(ValueError):
    companion([])

@ilayn
Copy link
Member

ilayn commented May 1, 2022

What I am trying to emphasize is that either you do the input shaping and do list comprehension or you do a 2D input and reshape afterwards.

I don't get what the difficulty is with the format I gave for nD arrays and how it would be easier with afterwards. Because now you have to massage your input such that polynomials are going to be arranged such that every [n, m,..., :] is going to be polynomials. So there is no shortcut done here unless you have already a mechanism that produces this format. But for general users, this has to be created which I don't see how or where it will be done.

@brandondavid brandondavid added enhancement A new feature or improvement scipy.linalg labels May 1, 2022
@lucascolley lucascolley added the needs-work Items that are pending response from the author label Jan 4, 2024
Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... in the code, I don't see the list comprehensions, for loops, or nested arrays mentioned in the comments. It looks like a straightforward extension of the function to batch over stacks of matrices (of arbitrary shapes) with standard array operations, which would be much faster than loops or comprehensions to do the same thing. There have been requests for other special matrix functions to allow batch operations like this (e.g. gh-17344), and it's analogous to the way NumPy's linalg functions are batched.

Is the hesitation that we don't want to allow batched use of the special matrix functions?

In the meantime, I'll go ahead and push the suggested changes.

Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more documentation updates would be needed.

Maybe should add an example of the batched behavior.

@ilayn
Copy link
Member

ilayn commented Aug 22, 2024

I don't want to kill your momentum but some issues still have discussion running and we don't need to conclude. I think this is getting a bit too quickfire with all open issues. I want to close everything too but some of them has still some open ends and not necessarily concluded.

For special matrices in particular there I don't see having a 2D array for making a simple for loop just for the sake of batch input. Plus all of them must have the same order. You can still fill in a 3D array with a one liner with no change in the notation.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 22, 2024

I don't want to kill your momentum

Hah, I was wondering when you were going to ask me to stop. I know that it looks like I'm tearing through recklessly, but I'm careful. If it makes you uncomfortable anyway, I can slow down, but fortunately, I'm almost done.

some of them has still some open ends and not necessarily concluded.

Right, I wasn't going to merge this one because I can see that. That's why I asked:

Is the hesitation that we don't want to allow batched use of the special matrix functions?

It sounds like your answer to that is yes:

I don't see having a 2D array for making a simple for loop just for the sake of batch input. Plus all of them must have the same order. You can still fill in a 3D array with a one liner with no change in the notation.

Separate users requested the ability to batch create special matrices at least twice in 2022. I don't know the use-cases they have in mind, but the requests sound easy to satisfy and very consistent with the rest of SciPy. (Supporting batch input and output is very common.) As you wrote about real/complex support, we wouldn't want to support batching in some functions but not others - fortunately, it looks like batching can be added for all special matrix functions. I thought it was already decided that linalg functions that accept matrices should support batch input (e.g. like det, eigm, like we discussed for solve, etc.). Wouldn't it be inconsistent to add the ability to batch all the other functions that accept matrices but not batch the creation of special matrices? Imagine if NumPy had n-D ufuncs but the output of zeros were restricted to 1D and one had to loop and stack (e.g. reshape is not available). It works, and maybe it's a one-liner, but I'm sure glad zeros accepts a shape argument!

@DerWeh
Copy link
Contributor Author

DerWeh commented Aug 23, 2024

I added this PR because I had a specific scenario where I needed the companion matrices and the loops were too slow. As I had to implement a vectorized version for performance, I thought of contributing it back.
As there was no desire to include it at that point, I abandoned the PR. But if there is anything you want me to do, I can pick it up again (though I am currently completely clueless what I did 2 years ago).

@mdhaber
Copy link
Contributor

mdhaber commented Aug 23, 2024

Thanks @DerWeh, I think question now is just whether we want to allow batches in special matrix creation functions. If not, then we would close this PR and gh-21419 (which attempts to close the request in gh-17344); if so, then I'd be happy to help you get this merged. Thanks for your patience!

@mdhaber
Copy link
Contributor

mdhaber commented Aug 23, 2024

Actually @DerWeh since you have an application where the performance made a difference, would you be willing to share more information about that? How big of a batch are we looking at, and what was the time difference? Was this a bottleneck for you, or just an inconvenience? I think having a real-world example for a loop/list comprehension being insufficient might help.

@DerWeh
Copy link
Contributor Author

DerWeh commented Aug 24, 2024

@mdhaber I'll try digging up the code (or maybe I can reproduce something similar). In the meantime I switched fields, jobs, and flats. I am not sure how much I backed up, as in the end the approach turned out to be a dead end (was research, after all).

To provide some context: I needed stable linear prediction of time-series. Some code I used is public: _predict_stable was the method I used. Setting up a batch of companion matrixes, doing an eigenvalue decomposition, dropping eigenvalues with a magnitude larger than 1 (exponential growing terms). Seeing that I did an eigen-decomposition of each companion matrix, I am quite surprised, that the loop mattered. I vaguely remember, that it was important to get MKL, which significantly speed up the decomposition before the loops started to matter.
Might be that it only mattered on the hardware I had at hand, as all the MKL stuff profits from multithreading.

My educated guess would be that this PR mostly offers comfort, as you can stick in your mental framework of batched matrix operations. Performance matters probably only in very specific scenarios. But without profiling, it's just guesswork. I'll see if the code is still around somewhere.

@ilayn
Copy link
Member

ilayn commented Aug 24, 2024

For such use cases numpy.roots since it uses companion matrices already to compute the roots. Otherwise you are passing through the eigenvalue API which slows things down.

Polynomial root finding is tricky business and an inherently unstable operation, that's why not much has happened except forming the Fiedler's companion matrix (which we added later). It has slightly better properties in terms of eigenvalue computations due to its pentadiagonal form and responds to matrix balancing slightly better.


But other than that, the discussion is whether a few special matrix functions generate nD arrays. If that is to happen we have to it for every member. We spent quite some time to unify the linalg things so that this zebra api problem goes away such that similar things have similar behavior.

The feature requests in this regard, so far, propose to hide the for loop inside SciPy and it is comparable labor and performance whether it is done by user or SciPy. So I don't see an immediate benefit. That is the argument. As long as we don't generate new zebra patterns it's OK either way with little benefit to the users or to SciPy functionality.

@ev-br
Copy link
Member

ev-br commented Aug 24, 2024

FWIW, I think it would be great to have uniform batching is scipy.linalg. This is a large project of course :-).

Long term, one benefit is to fix the disparity with numpy, as we in many cases have a weird story along the lines of "scipy.linalg has X on top of the numpy namesake, but numpy has batching and scipy does not".

@mdhaber
Copy link
Contributor

mdhaber commented Aug 24, 2024

The feature requests in this regard, so far, propose to hide the for loop inside SciPy...with little benefit to the users or to SciPy functionality.

I've seen a few times that this is just a for loop or list comprehension, but I mentioned above that I don't understand that part because I don't see the loop or list comprehension - just NumPy. Could you elaborate on that? Does this mean that a compiled for loop in NumPy is no better than a pure Python for loop in user code?

Besides the convenience, I see a 4~5x speedup on my machine.

import numpy as np
from scipy import linalg
rng = np.random.default_rng(2549824598234528)

# in results below, m is 100, 1000, then 10000
m, n = 100, 100
A = rng.random((m, n))
C = np.array([linalg.companion(x) for x in A])
# 2.97 ms ± 46.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 37.8 ms ± 1.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 465 ms ± 53.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

C = linalg.companion(A)
# 668 µs ± 8.75 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# 7.54 ms ± 157 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 108 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In gh-18225 you added support for stacked arrays to det - thank you! I don't see a speedup there on my machine, but I'm still very glad the capability is there.

import numpy as np
from scipy import linalg
rng = np.random.default_rng(2549824598234528)

# in results below, m is 100, 1000, then 10000
m, n = 10000, 100
A = rng.random((m, n))
C = linalg.companion(A)

%timeit np.array([linalg.det(x) for x in C])
# 8.56 ms ± 19.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 92.9 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 879 ms ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit linalg.det(C)
# 8.8 ms ± 71.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 87.3 ms ± 272 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 1.31 s ± 62.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

There have been a few releases since then, but still norm and most other linalg functions are 2-D only - and as much as I've wanted batched versions of some of these functions, it's totally understandable. A mega-PR to vectorize all linalg functions at once would be very challenging. And you have certainly been doing much more important work.

Since there are only a few matrix creation functions, though, we can do all of them in one release if that's important:

  • block_diag - there a few different ways people could want this to work in N-D; needs more discussion
  • circulant - ENH: linalg.circulant: allow batching #21419
  • companion - this PR
  • convolution_matrix
  • dft - Creating stacks of n x n DFT matrices is just broadcasting, and we can't have multiple n in one array
  • fiedler
  • fiedler_companion
  • ~~hadamard ~~ - same as dft
  • hilbert - same as dft
  • leslie
  • pascal - same as dft
  • invpascal - same as dft
  • toeplitz

So besides this PR and gh-21419, there are just five other special matrix functions that currently accept 1-D input and produce 2-D output which we'd generalize to accept N-D input and produce (N+1)-D output. I'd be happy to write or review a single PR that adds batch support for these with a decorator (Update: or np.apply_along_axis), e.g.

Update: no need for the code below - we can just use* `np.apply_along_axis` *and make sure the documentation is preserved.
def _batch_special_matrix(func):
    def wrapper(X):
        X = np.asarray(X)
        if X.ndim <= 1:
            return func(X)

        n = X.shape[-1]
        Y = np.asarray([func(x) for x in X.reshape(-1, n)])
        return Y.reshape(X.shape[:-1] + (Y.shape[-1],)*2)
    return wrapper

or actually

def _batch_special_matrix(func):
    def wrapper(X):
        X = np.asarray(X)
        if X.ndim <= 1:
            return func(X)
        return np.apply_along_axis(func, X, -1)
    return wrapper

and common property-based test. Then can improve the performance of these at our leisure. How does that sound?

@ilayn
Copy link
Member

ilayn commented Aug 24, 2024

There have been a few releases since then, but still norm and most other linalg functions are 2-D only

I am doing that within my capacity. I also did lu and expm. Then logm and sqrtm is on my computer somewhere. It's not like I don't want to but I was busy with other stuff. If you want to join that effort the more the merrier.

I don't see a speedup there on my machine, but I'm still very glad the capability is there.

For det, you can't win more than the LU decomposition speed hence it is lower bounded. It was sped up when LU was done.

Besides the convenience, I see a 4~5x speedup on my machine.

That's because you are testing polynomials of 100th degree 100 times. That's obviously going to be faster. That's not the use case for these arrays. Better performance never hurts but for the general use cases which are small arrays this does not make too much of difference.

But as I mentioned, if we are doing all of them, no reservations. I'm +1 too.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 24, 2024

Of course! I don't mean that the others need to be done faster, and I recognized above that you have been doing much more important work. Thank you again! I just mean that w.r.t API consistenty, we should hold this effort only to the same standard.

OK, if it sounds good, I'll adjust the documentation here and merge. Then I'll prepare a PR for the five other special matrix functions mentioned above, and maybe I'll create a tracking issue for the rest of linalg.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 24, 2024

OK @DerWeh does this still look good to you? I think it's ready.

@DerWeh
Copy link
Contributor Author

DerWeh commented Aug 25, 2024

@mdhaber Thanks for pushing it. Looks good to me. The only thing which might be up to debate is the documentation of the error:

Raises
------
ValueError
    If any of the following are true: a) ``a.shape[-1] < 2``; b) ``a[..., 0] == 0``.

Technically, the check is any(a[..., 0] == 0) as we (might) have an array now (in code np.any, but for the doc, any is probably more readable). I think current version is also understandable, and would choose whatever is more in line with the rest of the docs.

Currently, I am rather busy, but I would also be willing to help to batch SciPy. Personally, I think it really improves readability as we hide loops and focus on the math that is happening.

@DerWeh
Copy link
Contributor Author

DerWeh commented Aug 25, 2024

For such use cases numpy.roots since it uses companion matrices already to compute the roots. Otherwise you are passing through the eigenvalue API which slows things down.

Polynomial root finding is tricky business and an inherently unstable operation, that's why not much has happened except forming the Fiedler's companion matrix (which we added later). It has slightly better properties in terms of eigenvalue computations due to its pentadiagonal form and responds to matrix balancing slightly better.

@ilayn Thanks for the valuable insight. The Fielder's companion matrix wasn't there at the time, so I am glad to learn about it. The point why it didn't use numpy.roots is, I not only need the roots, but the full matrix. I am just manipulating roots and the restoring the full matrix.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 26, 2024

Thanks @DerWeh. I was thinking about that, too, but decided that the English "any" was fine instead of a code any. OK, I'lll go ahead and merge this so I can add it to the tests in gh-21446. Thanks all!

@mdhaber mdhaber merged commit e2ce1a5 into scipy:main Aug 26, 2024
36 checks passed
@DerWeh DerWeh deleted the vectorize-companion branch August 26, 2024 17:35
@lucascolley lucascolley added needs-release-note a maintainer should add a release note written by a reviewer/author to the wiki and removed needs-work Items that are pending response from the author labels Aug 26, 2024
anushkasuyal pushed a commit to anushkasuyal/scipy that referenced this pull request Sep 13, 2024
* ENH: linalg.companion: add N-D batch support 

---------

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
@lucascolley lucascolley removed the needs-release-note a maintainer should add a release note written by a reviewer/author to the wiki label Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.linalg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants