-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
ENH: vectorize companion matrix function #16090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I don't see the motivation for this since this is practically a for loop >>> A = np.arange(1, 7).reshape(2,3) # every row is a polynomial
>>> C = np.array([la.companion(x) for x in A])
array([[[-2. , -3. ],
[ 1. , 0. ]],
[[-1.25, -1.5 ],
[ 1. , 0. ]]]) So I don't think anybody will have any significant performance boost from it. Also nested arrays (arrays that have array objects as elements) are very bad to work with in general. Here you can see nD-arrays are much easier to work with as you can slice them however you wish |
Aren't all vectorizations practically for loops? It is just more readable to have it vectorized and sometimes faster, in case the python loop is the bottleneck. I basically implemented this version to be able to drop The example I have in mind is linear prediction (it is required to calculate the companion matrix to eliminate exponentially growing terms, see e.g., DOI: 10.1103/physrevb.92.155132.) As all matrix operations are vectorized in But I agree, it's only a nice-to-have, not something particular important.
|
Vectorize is for applying the same function over array elements. As you can see, here you don't need a function over arrays but collecting all the arrays. gufuncs are for operating on array every element so that the implementation is done at the low-level as quick as possible. Here you are generating 2D arrays.
I basically wrote a list comprehension with no loops. Try adding, summing, multipying or doing anything with nested arrays and you will find out that the reason for nD arrays to exist is precisely that. |
Sorry, but I do not understand your point. Of course, a single comprehension is enough to treat that specific (2-dimensional) input, as there is only a single additional axis. In general, you need to write Aflat = A.reshape(-1, A.shape[-1])
Cflat = np.array([companion(x) for x in Aflat])
C = Cflat.reshape(A.shape[:-1] + (A.shape[-1]-1, A.shape[-1]-1)) which I find harder to read than C = np.companion(A) Again, it is not necessary, but it makes the code more clear. Furthermore, I don't really see the difference to other function like Think it's a bad idea, feel free to reject this pull request. I will just ship my own version, just thought this might also help others. |
The now allowed cases have to still be removed from the test: def test_bad_shapes(self):
assert_raises(ValueError, companion, [[1, 1], [2, 2]])
assert_raises(ValueError, companion, [0, 4, 5])
assert_raises(ValueError, companion, [1])
assert_raises(ValueError, companion, []) However, I don't know any |
|
What I am trying to emphasize is that either you do the input shaping and do list comprehension or you do a 2D input and reshape afterwards. I don't get what the difficulty is with the format I gave for nD arrays and how it would be easier with afterwards. Because now you have to massage your input such that polynomials are going to be arranged such that every |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... in the code, I don't see the list comprehensions, for
loops, or nested arrays mentioned in the comments. It looks like a straightforward extension of the function to batch over stacks of matrices (of arbitrary shapes) with standard array operations, which would be much faster than loops or comprehensions to do the same thing. There have been requests for other special matrix functions to allow batch operations like this (e.g. gh-17344), and it's analogous to the way NumPy's linalg
functions are batched.
Is the hesitation that we don't want to allow batched use of the special matrix functions?
In the meantime, I'll go ahead and push the suggested changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more documentation updates would be needed.
Maybe should add an example of the batched behavior.
I don't want to kill your momentum but some issues still have discussion running and we don't need to conclude. I think this is getting a bit too quickfire with all open issues. I want to close everything too but some of them has still some open ends and not necessarily concluded. For special matrices in particular there I don't see having a 2D array for making a simple for loop just for the sake of batch input. Plus all of them must have the same order. You can still fill in a 3D array with a one liner with no change in the notation. |
Hah, I was wondering when you were going to ask me to stop. I know that it looks like I'm tearing through recklessly, but I'm careful. If it makes you uncomfortable anyway, I can slow down, but fortunately, I'm almost done.
Right, I wasn't going to merge this one because I can see that. That's why I asked:
It sounds like your answer to that is yes:
Separate users requested the ability to batch create special matrices at least twice in 2022. I don't know the use-cases they have in mind, but the requests sound easy to satisfy and very consistent with the rest of SciPy. (Supporting batch input and output is very common.) As you wrote about real/complex support, we wouldn't want to support batching in some functions but not others - fortunately, it looks like batching can be added for all special matrix functions. I thought it was already decided that |
I added this PR because I had a specific scenario where I needed the companion matrices and the loops were too slow. As I had to implement a vectorized version for performance, I thought of contributing it back. |
Actually @DerWeh since you have an application where the performance made a difference, would you be willing to share more information about that? How big of a batch are we looking at, and what was the time difference? Was this a bottleneck for you, or just an inconvenience? I think having a real-world example for a loop/list comprehension being insufficient might help. |
@mdhaber I'll try digging up the code (or maybe I can reproduce something similar). In the meantime I switched fields, jobs, and flats. I am not sure how much I backed up, as in the end the approach turned out to be a dead end (was research, after all). To provide some context: I needed stable linear prediction of time-series. Some code I used is public: _predict_stable was the method I used. Setting up a batch of companion matrixes, doing an eigenvalue decomposition, dropping eigenvalues with a magnitude larger than 1 (exponential growing terms). Seeing that I did an eigen-decomposition of each companion matrix, I am quite surprised, that the loop mattered. I vaguely remember, that it was important to get MKL, which significantly speed up the decomposition before the loops started to matter. My educated guess would be that this PR mostly offers comfort, as you can stick in your mental framework of batched matrix operations. Performance matters probably only in very specific scenarios. But without profiling, it's just guesswork. I'll see if the code is still around somewhere. |
For such use cases Polynomial root finding is tricky business and an inherently unstable operation, that's why not much has happened except forming the Fiedler's companion matrix (which we added later). It has slightly better properties in terms of eigenvalue computations due to its pentadiagonal form and responds to matrix balancing slightly better. But other than that, the discussion is whether a few special matrix functions generate nD arrays. If that is to happen we have to it for every member. We spent quite some time to unify the linalg things so that this zebra api problem goes away such that similar things have similar behavior. The feature requests in this regard, so far, propose to hide the for loop inside SciPy and it is comparable labor and performance whether it is done by user or SciPy. So I don't see an immediate benefit. That is the argument. As long as we don't generate new zebra patterns it's OK either way with little benefit to the users or to SciPy functionality. |
FWIW, I think it would be great to have uniform batching is scipy.linalg. This is a large project of course :-). Long term, one benefit is to fix the disparity with numpy, as we in many cases have a weird story along the lines of "scipy.linalg has X on top of the numpy namesake, but numpy has batching and scipy does not". |
I've seen a few times that this is just a Besides the convenience, I see a 4~5x speedup on my machine. import numpy as np
from scipy import linalg
rng = np.random.default_rng(2549824598234528)
# in results below, m is 100, 1000, then 10000
m, n = 100, 100
A = rng.random((m, n))
C = np.array([linalg.companion(x) for x in A])
# 2.97 ms ± 46.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 37.8 ms ± 1.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 465 ms ± 53.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
C = linalg.companion(A)
# 668 µs ± 8.75 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# 7.54 ms ± 157 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 108 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) In gh-18225 you added support for stacked arrays to import numpy as np
from scipy import linalg
rng = np.random.default_rng(2549824598234528)
# in results below, m is 100, 1000, then 10000
m, n = 10000, 100
A = rng.random((m, n))
C = linalg.companion(A)
%timeit np.array([linalg.det(x) for x in C])
# 8.56 ms ± 19.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 92.9 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 879 ms ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit linalg.det(C)
# 8.8 ms ± 71.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 87.3 ms ± 272 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 1.31 s ± 62.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) There have been a few releases since then, but still Since there are only a few matrix creation functions, though, we can do all of them in one release if that's important:
So besides this PR and gh-21419, there are just five other special matrix functions that currently accept 1-D input and produce 2-D output which we'd generalize to accept N-D input and produce (N+1)-D output. I'd be happy to write or review a single PR that adds batch support for these with a decorator (Update: or
Update: no need for the code below - we can just use* `np.apply_along_axis` *and make sure the documentation is preserved.
def _batch_special_matrix(func):
def wrapper(X):
X = np.asarray(X)
if X.ndim <= 1:
return func(X)
n = X.shape[-1]
Y = np.asarray([func(x) for x in X.reshape(-1, n)])
return Y.reshape(X.shape[:-1] + (Y.shape[-1],)*2)
return wrapper or actually def _batch_special_matrix(func):
def wrapper(X):
X = np.asarray(X)
if X.ndim <= 1:
return func(X)
return np.apply_along_axis(func, X, -1)
return wrapper and common property-based test. Then can improve the performance of these at our leisure. How does that sound? |
I am doing that within my capacity. I also did
For
That's because you are testing polynomials of 100th degree 100 times. That's obviously going to be faster. That's not the use case for these arrays. Better performance never hurts but for the general use cases which are small arrays this does not make too much of difference. But as I mentioned, if we are doing all of them, no reservations. I'm +1 too. |
Of course! I don't mean that the others need to be done faster, and I recognized above that you have been doing much more important work. Thank you again! I just mean that w.r.t API consistenty, we should hold this effort only to the same standard. OK, if it sounds good, I'll adjust the documentation here and merge. Then I'll prepare a PR for the five other special matrix functions mentioned above, and maybe I'll create a tracking issue for the rest of |
[docs only]
OK @DerWeh does this still look good to you? I think it's ready. |
@mdhaber Thanks for pushing it. Looks good to me. The only thing which might be up to debate is the documentation of the error:
Technically, the check is Currently, I am rather busy, but I would also be willing to help to batch SciPy. Personally, I think it really improves readability as we hide loops and focus on the math that is happening. |
@ilayn Thanks for the valuable insight. The Fielder's companion matrix wasn't there at the time, so I am glad to learn about it. The point why it didn't use |
* ENH: linalg.companion: add N-D batch support --------- Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
What does this implement/fix?
Vectorize the function
scipy.linalg.companion
to be able to treat stacks of polynomial coefficients. Probably, we need to add information to the doc-string when this feature was introduced.Additional information
I tested the vectorization against
numpy.vectorize
usinghypothesis
.This test is, however, not included as unit-test as I don't know if it is OK to use
hypothesis
.