-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
perf(matrices): Use DomainMatrix for Matrix.rref over ZZ/QQ #25443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Hi, I am the SymPy bot (v170). I'm here to help you write a release notes entry. Please read the guide on how to write release notes. Your release notes are in good order. Here is what the release notes will look like:
This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.13. Click here to see the pull request description that was parsed.
Update The release notes on the wiki have been updated. |
🟠Hi, I am the SymPy bot (v170). I've noticed that some of your commits add or delete files. Since this is sometimes done unintentionally, I wanted to alert you about it. This is an experimental feature of SymPy Bot. If you have any feedback on it, please comment at sympy/sympy-bot#75. The following commits add new files:
If these files were added/deleted on purpose, you can ignore this message. |
Note that the diff here would be a lot smaller if gh-25395 was merged. I wanted this to be two separate PRs where one adds the algorithm for sparse fraction-free elimination to DomainMatrix and then this PR adds the code for selecting the fastest algorithm and for making Matrix.rref use DomainMatrix.rref. |
There is some complex logic in For matrices over ZZ the fastest algorithm/implementation is always either the sparse implementation of fraction-free elimination over ZZ ( These plot shows timings for Matrix with the PR and compare timings for 4 different things:
This first plot shows that fraction-free elimination is fastest for dense matrices. Here we have dense matrices of 100 bit integers with varying size:
The output is:
The next plot shows the other extreme of very sparse matrices having approximately one nonzero entry per row and with small bit sizes:
So we see that there are extremes of dense vs sparse where we need to choose between the two DomainMatrix RREF algorithms. In between there is a crossover where one becomes faster than the other and I have tried to measure that crossover empirically. Note that the most important thing is to get the extremes right because getting the crossover right is impossible in general. The critical parameter is the "density" of the matrix which is defined as the average number of nonzero entry per row (but only in rows that are not completely zero). The best density threshold depends on the size, shape and the bit count of the integers. The formula used in the code is: wideness = max(1, 2/3*ncols/nrows_nz)
max_density = (5 + 10000/(nrows_nz*bits**2)) * wideness
if density < max_density:
return 'rref_QQ'
else:
return 'rref_den_ZZ' Here 10000 is just a number I found empirically as is the rest of the formula but I have tested this with different combinations of bit-counts, shape, size etc. This next plot shows what happens as we vary the density for sparse 100x100 matrices of 10-bit integers:
|
A very odd test case has failed: sympy/sympy/integrals/tests/test_integrals.py Lines 1939 to 1953 in 1d9d3aa
I'm not sure that the test is completely valid but at least the intention was that this PR should not change anything because the RREF over ZZ or QQ is well defined and unique etc so the difference in algorithm should not lead to any change in behaviour. I will need to investigate what has caused that. The result is computed by heurisch so somehow this change has changed the output of heurisch. I think that I have seen this case before and that the reason that heurisch still uses the dense implementation of Oh, I see the problem. This is causing the sparse implementation to be used instead of the dense implementation for A quick fix is that heurisch could use a A better fix would be to prevent heurisch from creating the expression sympy/sympy/integrals/heurisch.py Line 168 in 1d9d3aa
(Calling solve on an arbitrary expression without specifying what to solve for is not really very well defined and should be avoided in core algorithms.) |
Looking into it the integration problem is actually caused by this matrix: a, b = symbols('a, b')
M = Matrix([
[0, 0, 0, 0, 0, 0, 0, 0, b**3, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, -4*a*b**2, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 6*b**3, 0, 0, 0, 4*a**2*b, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, -24*a*b**2, 0, 0, 0, -4*a*b**2, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 24*a**2*b - 6*b**3, 0, 0, 0, 8*a**2*b - 2*b**3, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 4*a*b**2, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 24*a**2*b - 6*b**3, 0, 0, 0, 4*a**2*b, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 24*a*b**2, 0, 0, 0, 4*a*b**2, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 6*b**3, 0, 0, 0, b**3, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 2*b**3, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, b**3, 0, -8*a*b**2, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, -4*a*b**2, 4*b**3, 8*a**2*b, 0, 0, 0, 0, 0],
[0, 0, 0, 4*b**3, 0, 0, 0, 4*a**2*b, -16*a*b**2, -8*a*b**2, 0, 0, 0, 0, 0],
[0, 0, 0, -16*a*b**2, 0, 0, 0, -4*a*b**2, 16*a**2*b - 4*b**3, 16*a**2*b - 4*b**3, 0, 0, 0, 0, 0],
[0, 0, 0, 16*a**2*b - 4*b**3, 0, 0, 0, 8*a**2*b - 2*b**3, 0, 8*a*b**2, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 4*a*b**2, 16*a**2*b - 4*b**3, 8*a**2*b, 0, 0, 0, 0, 0],
[0, 0, 0, 16*a**2*b - 4*b**3, 0, 0, 0, 4*a**2*b, 16*a*b**2, 8*a*b**2, 0, 0, 0, 0, 0],
[0, 0, 0, 16*a*b**2, 0, 0, 0, 4*a*b**2, 4*b**3, 2*b**3, 0, 0, 0, 0, 0],
[0, 0, 0, 4*b**3, 0, 0, 0, b**3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 3*b**3, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 2*b**3, -12*a*b**2, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, b**3, 0, 0, -8*a*b**2, 12*a**2*b, 0, 0, 2*b**3, 0, 0, 0, 0, 0],
[0, 0, -4*a*b**2, 0, 0, 8*a**2*b, -12*a*b**2, 2*b**3, 0, -8*a*b**2, b**3, b**3, 2*b**3, 0, 0],
[0, 2*b**3, 4*a**2*b, 0, 0, -8*a*b**2, 24*a**2*b - 6*b**3, -8*a*b**2, 0, 8*a**2*b - 2*b**3, -3*a*b**2 + b**2*sqrt(a**2 + b**2), -3*a*b**2 - b**2*sqrt(a**2 + b**2), -8*a*b**2, b**3, -2*b**2],
[0, -8*a*b**2, -4*a*b**2, 0, 0, 16*a**2*b - 4*b**3, 12*a*b**2, 8*a**2*b - 2*b**3, 0, 0, 2*a**2*b - 2*a*b*sqrt(a**2 + b**2) + b**3, 2*a**2*b + 2*a*b*sqrt(a**2 + b**2) + b**3, 8*a**2*b - 2*b**3, -4*a*b**2, 4*a*b],
[0, 8*a**2*b - 2*b**3, 8*a**2*b - 2*b**3, 0, 0, 8*a*b**2, 12*a**2*b, 0, 0, 8*a**2*b - 2*b**3, -5*a*b**2 + b**2*sqrt(a**2 + b**2), -5*a*b**2 - b**2*sqrt(a**2 + b**2), 0, 4*a**2*b - b**3, -2*b**2],
[0, 0, 4*a*b**2, 0, 0, 8*a**2*b, 12*a*b**2, 8*a**2*b - 2*b**3, 0, 8*a*b**2, 4*a**2*b - 4*a*b*sqrt(a**2 + b**2) - b**3, 4*a**2*b + 4*a*b*sqrt(a**2 + b**2) - b**3, 8*a**2*b - 2*b**3, 0, 8*a*b],
[0, 8*a**2*b - 2*b**3, 4*a**2*b, 0, 0, 8*a*b**2, 3*b**3, 8*a*b**2, 0, 2*b**3, -a*b**2 - b**2*sqrt(a**2 + b**2), -a*b**2 + b**2*sqrt(a**2 + b**2), 8*a*b**2, 4*a**2*b - b**3, 2*b**2],
[0, 8*a*b**2, 4*a*b**2, 0, 0, 2*b**3, 0, 2*b**3, 0, 0, 2*a**2*b - 2*a*b*sqrt(a**2 + b**2) - b**3, 2*a**2*b + 2*a*b*sqrt(a**2 + b**2) - b**3, 2*b**3, 4*a*b**2, 4*a*b],
[0, 2*b**3, b**3, 0, 0, 0, 0, 0, 0, 0, a*b**2 - b**2*sqrt(a**2 + b**2), a*b**2 + b**2*sqrt(a**2 + b**2), 0, b**3, 2*b**2]])
M1 = M.to_DM(EX).rref_gj_div()[0].to_Matrix()
M2 = M.to_DM(EX).to_dense().rref_gj_div()[0].to_Matrix()
print(M1 == M2) The two algorithms process arithmetic in a different order and end up with different end results because the domain is EX and canonical results are not guaranteed. Actually the two results are equivalent but one the result from the sparse implementation is much more complicated. This is somehow because
The issue is the two nonzero expressions in the final column. From the dense implementation we have: In [21]: M2[9,-1]
Out[21]:
-1
────────────
_________
╱ 2 2
╲╱ a + b
In [22]: M2[10,-1]
Out[22]:
1
────────────
_________
╱ 2 2
╲╱ a + b However the sparse implementation ends up with: In [23]: print(M1[9,-1])
16*a**5/(16*a**6 - 16*a**5*sqrt(a**2 + b**2) + 28*a**4*b**2 - 20*a**3*b**2*sqrt(a**2 + b**2) + 13*a**2*b**4 - 5*a*b**4*sqrt(a**2 + b**2) + b**6) - 16*a**4*sqrt(a**2 + b**2)/(16*a**6 - 16*a**5*sqrt(a**2 + b**2) + 28*a**4*b**2 - 20*a**3*b**2*sqrt(a**2 + b**2) + 13*a**2*b**4 - 5*a*b**4*sqrt(a**2 + b**2) + b**6) + 20*a**3*b**2/(16*a**6 - 16*a**5*sqrt(a**2 + b**2) + 28*a**4*b**2 - 20*a**3*b**2*sqrt(a**2 + b**2) + 13*a**2*b**4 - 5*a*b**4*sqrt(a**2 + b**2) + b**6) - 12*a**2*b**2*sqrt(a**2 + b**2)/(16*a**6 - 16*a**5*sqrt(a**2 + b**2) + 28*a**4*b**2 - 20*a**3*b**2*sqrt(a**2 + b**2) + 13*a**2*b**4 - 5*a*b**4*sqrt(a**2 + b**2) + b**6) + 5*a*b**4/(16*a**6 - 16*a**5*sqrt(a**2 + b**2) + 28*a**4*b**2 - 20*a**3*b**2*sqrt(a**2 + b**2) + 13*a**2*b**4 - 5*a*b**4*sqrt(a**2 + b**2) + b**6) - b**4*sqrt(a**2 + b**2)/(16*a**6 - 16*a**5*sqrt(a**2 + b**2) + 28*a**4*b**2 - 20*a**3*b**2*sqrt(a**2 + b**2) + 13*a**2*b**4 - 5*a*b**4*sqrt(a**2 + b**2) + b**6)
In [24]: print(M1[10,-1])
-4*a**3/(4*a**4 - 4*a**3*sqrt(a**2 + b**2) + 5*a**2*b**2 - 3*a*b**2*sqrt(a**2 + b**2) + b**4) + 4*a**2*sqrt(a**2 + b**2)/(4*a**4 - 4*a**3*sqrt(a**2 + b**2) + 5*a**2*b**2 - 3*a*b**2*sqrt(a**2 + b**2) + b**4) - 3*a*b**2/(4*a**4 - 4*a**3*sqrt(a**2 + b**2) + 5*a**2*b**2 - 3*a*b**2*sqrt(a**2 + b**2) + b**4) + b**2*sqrt(a**2 + b**2)/(4*a**4 - 4*a**3*sqrt(a**2 + b**2) + 5*a**2*b**2 - 3*a*b**2*sqrt(a**2 + b**2) + b**4) These two expressions do simplify a little under cancel but radsimp can simplify them all the way:
The complicated form that they come out in results from The ideal solution here would be to have a domain that can represent these expressions. It can be done with In [75]: K = QuadRing(QQ.frac_field(a, b), [c], [c**2 - (a**2 + b**2)])
In [76]: dM = M.subs(sqrt(a**2+b**2), c).to_DM(K)
In [77]: print(repr(dM.rref()[0].to_Matrix()))
Matrix([
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1/2, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, -c/(a**2 + b**2)],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, c/(a**2 + b**2)],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]) That uses a In any case this example is a distraction from the main purpose of this PR which is for ZZ and QQ. I will restore the previous behaviour that heurisch at least uses the dense implementation for |
Benchmark results from GitHub Actions Lower numbers are good, higher numbers are bad. A ratio less than 1 Significantly changed benchmark results (PR vs master) before after ratio
[221d7730] [53007970]
- 162±1μs 90.2±0.2μs 0.56 solve.TimeMatrixOperations.time_rref(3, 0)
- 311±1μs 109±0.2μs 0.35 solve.TimeMatrixOperations.time_rref(4, 0)
- 31.4±0.06ms 13.5±0.02ms 0.43 solve.TimeSolveLinSys189x49.time_solve_lin_sys
Significantly changed benchmark results (master vs previous release) before after ratio
[8059df73] [221d7730]
<sympy-1.12^0>
- 89.6±0.9ms 56.7±0.6ms 0.63 integrate.TimeIntegrationRisch02.time_doit(10)
- 85.3±2ms 55.4±0.3ms 0.65 integrate.TimeIntegrationRisch02.time_doit_risch(10)
- 2.09±0ms 655±1μs 0.31 polys.TimePREM_LinearDenseQuadraticGCD.time_op(3, 'sparse')
- 10.3±0.02ms 1.95±0.01ms 0.19 polys.TimePREM_LinearDenseQuadraticGCD.time_op(5, 'sparse')
- 367±0.6μs 82.5±0.3μs 0.22 polys.TimePREM_QuadraticNonMonicGCD.time_op(1, 'sparse')
- 4.82±0ms 363±0.4μs 0.08 polys.TimePREM_QuadraticNonMonicGCD.time_op(3, 'sparse')
- 10.6±0.02ms 1.09±0ms 0.10 polys.TimePREM_QuadraticNonMonicGCD.time_op(5, 'sparse')
- 6.23±0.01ms 3.96±0.01ms 0.64 polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(2, 'sparse')
- 27.1±0.03ms 12.0±0.02ms 0.44 polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(3, 'sparse')
- 6.93±0.02ms 1.18±0ms 0.17 polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(1, 'sparse')
- 16.1±0.02ms 9.26±0.01ms 0.57 polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(2, 'sparse')
- 211±0.04ms 70.8±0.07ms 0.34 polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(3, 'sparse')
- 6.37±0.01ms 534±1μs 0.08 polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(3, 'sparse')
- 27.7±0.05ms 851±3μs 0.03 polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(5, 'sparse')
- 607±1μs 196±0.6μs 0.32 polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(1, 'sparse')
- 6.49±0.01ms 200±1μs 0.03 polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(3, 'sparse')
- 17.0±0.04ms 205±0.7μs 0.01 polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(5, 'sparse')
Full benchmark results can be found as artifacts in GitHub Actions |
I have restored the previous behaviour. Now a dense Ultimately a domain that can handle this is the best solution. There will of course always be some expressions that need to use the EX domain though and it is unavoidable that outputs cannot be fully canonical in that case. The reason I don't want to just change the test is because the output comes out much less nice and I am just not sure if that is an inherent feature of the sparse algorithm (when used with EX) or if this is just a random case where one method happens to simplify nicely and the other does not. For now I have restored previous behaviour although this PR does change heurisch to use the sparse implementation when the domain is not EX. There are certainly many cases where this will be a big speed improvement: I previously measured big improvements but ended up reverting the change because of |
sympy/polys/matrices/rref.py
Outdated
return M.from_rep(M_rref_sdm), den, pivots | ||
|
||
|
||
def _dm_rref_den_ZZ(Mz, keep_domain=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this, and each of the other functions that starts by selecting a method, accept a method
kwarg, for users who wanted to choose the method themselves, instead of accepting the heuristic choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess they could. I might need to think of better names for the methods though rather than rref_QQ
. In future the same kinds of methods would be considered for different domains like QQ(X)
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps for rref_den
it should be:
GJ
: Gauss Jordan with division. Always returns a result in the associated field.FFGJ
: Use fraction-free Gauss Jordan, always returns the same domain.CD
: If a field then clear denominators and convert to ring, always returns a result in the associated ring.
Those methods make sense for rref_den
. For rref
we don't have the option to return a result in a ring because generally the output needs to be in a field. Perhaps then it could be the same method names but with a slightly different meaning:
GJ
: Convert to field and use Gauss Jordan with division. Always returns a result in the associated field.FFGJ
: Use fraction-free Gauss Jordan, convert to field and divide at the end, always returns a result in the associated field.CD
: If a field then clear denominators and convert to ring, use FFGJ but then convert to field and divide at the end, always returns a result in the associated field.
Maybe there are better names for these algorithms/approaches...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a method parameter e.g.:
In [2]: M = Matrix([[1,1,1],[1,2,3]])
In [3]: M.to_DM().rref(method='GJ')
Out[3]: (DomainMatrix({0: {0: 1, 2: -1}, 1: {1: 1, 2: 2}}, (2, 3), QQ), (0, 1))
Possible methods are: auto, GJ, FF, CD, GJ_dense, FF_dense, CD_dense.
The parameter is for both rref and rref_den.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accept a method kwarg, for users who wanted to choose the method themselves, instead of accepting the heuristic choice?
My opinion is against that approach.
There is little of such usage to choose methods, and little usage of educating users to pick the methods,
unless the results are very different in visuals.
I'd just use simple functions for 99% of the time.
And it's just better to publicize more lower level handles, and let users program with them, than implementing many methods.
I've seen people who are looking for 'deep' features are actually looking for things like numpy C wrappers directly, rather than fiddling with 'methods'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the method parameter to DomainMatrix.rref
but not to Matrix.rref
. There is a tension here between whether or not DomainMatrix
methods should implement specific algorithms or should choose algorithms on behalf of the user. Until this PR every DomainMatrix method simply executes a specific algorithm. In the end though we will somewhere want to have something like an RREF method that can select the best algorithm if we know that different choices are better in different situations.
I can see though that method=
arguments are awkward later because:
- Most users don't use them.
- A high-level operation like
solve
that calls e.g.Matrix.rref
indirectly will not provide users any way to control the method anyway. - The method arguments make it awkward to change the internal details in future because they expose the internals too much.
Many Matrix methods have arguments like iszerofunc
or simplifyfunc
. This makes it awkward to e.g. change the method so that it uses DomainMatrix
even if there is a situation where DomainMatrix
is definitely better than any combination of existing options.
I think that the difference between these methods is not quite the same as the existing Matrix methods that have a method=
argument because here the differences are really about what fundamental arithmetic operations a domain supports though. Also we already need to have the code that selects a method and then dispatches based on that regardless of whether we want to expose the method argument to users.
I think the docs would greatly benefit from a tutorial page that teaches about the different RREF methods, and the properties of matrices that may make one method better than another, and finally teaches the user how to try the different methods manually, for special cases where the heuristics may not make the best choice. This would go along with allowing the user to pass a method name as a keyword arg. You've done a lot of research, and it would be great to codify what you've learned, in condensed and simplified form. While the heuristics do a great job of stitching together a probably-optimal default method, the code you've written has a lot of power that's not yet exposed to users, if they can't try out the various methods on demand. |
Maybe since this adds an |
I do think that this would be good but I would like to get this PR in now and add that later. The tutorial page that I am imagining would discuss a lot more than just RREF for ZZ and QQ which is what this PR is about. I already want to move on from this PR because I have other changes planned... In the meantime I can answer any questions though if there is some part that does not immediately make sense. |
The code quality tests have failed with:
I guess this is some update to flake8 or something but I don't think it's related to any changes here. |
That sounds fine. |
I tried pyodide test 3 multiple times, but the tests are timing out. |
If you mean that you want it to pass before merging this then that is not necessary. That job is not "required" for merging. If you are worried that anything is failing there then you can see that the tests in the job have passed:
Somehow it times out at shutdown. There were problems with this when migrating to use pytest (I remember @brocksam wrestling with it for a while). At the time the problem seemed to get fixed but now something flakey in pyodide has brought it back. I'm certain that it is a pyodide bug but I'm not sure that we have any meaningful information that we could share with the pyodide folks to help them fix it (perhaps it is at least worth opening an issue). |
Nothing constructive to add here other than I corroborate everything @oscarbenjamin is saying. I recently spent time looking at this again but wasn't able to make any progress. It is a peculiar issue for sure. #25398 is open for it. |
I am a bit unclear about what the status of this PR is now. It seems like a very clear improvement that should just be merged. It makes some things massively faster and does not cause any problems like giving undesirable output etc. Various suggestions have been made here and in gh-25452 about changes to the code. I'll try to address those here: It was suggested that it is not good to move the code from methods on DomainMatrix to functions in the rref.py module. I disagree with this very directly. The rref module added here has 500 lines of code for choosing what algorithm to use and will grow larger as more domains are considered. It is far too big to just be all in methods on DomainMatrix and it is better to have a separate module where someone can see the details of particular algorithms in one place. Ideally the actual algorithms themselves ( It was also suggested that a Adding the method= argument did complicate the code. I have simplified that a bit but it is not possible to make it much simpler: fundamentally there just is a certain amount of conditional logic that has to be there to make all of these pieces work and you can either put that together in large functions or break it up into small functions but the complexity won't disappear either way. I have tried to reduce repetition in the module because that was implicitly suggested. That does simplify things a little bit but not by much. It was also suggested that the denominator and non-denominator codepaths could be combined by treating the denominator as 1 in the non-denominator case. That just doesn't work: I tried it and it was a mess. The problem is that the conditional logic for what you need to do in terms of choosing domains, converting, dividing etc is just very different in the two cases and just setting the Other PRs that also bring substantial improvements but require more careful consideration will depend on this one e.g. gh-25452. With this PR I tried to make so that it is just an obvious uncontroversial improvement that we can just merge. The test failures are upstream bugs and are unrelated to this PR: they show up on all PRs. Any PR can still be merged regardless of those failures though so we don't need to worry about that. If anyone would like to see changes here then can they be very explicit about what those changes would be? Otherwise can we just merge this now? |
The only thing I specifically meant was the public functions inside |
sympy/polys/matrices/domainmatrix.py
Outdated
def rref_gj_div(self): | ||
"""Compute RREF using Gauss-Jordan elimination with division. | ||
|
||
The domain must be a field. | ||
|
||
Returns | ||
======= | ||
|
||
(DomainMatrix, list) | ||
reduced-row echelon form and list of pivots for the DomainMatrix | ||
|
||
See Also | ||
======== | ||
|
||
convert_to, lu | ||
rref | ||
Compute RREF choosing the most efficient algorithm (possibly uses | ||
this algorithm). | ||
rref_den | ||
Compute RREF with denominator using the most efficient algorithm | ||
(possibly uses this algorithm). | ||
sympy.polys.matrices.dense.ddm_irref | ||
The underlying algorithm in the dense case. | ||
sympy.polys.matrices.sdm.sdm_irref | ||
The underlying algorithm in the sparse case. | ||
""" | ||
if not self.domain.is_Field: | ||
raise DMNotAField('Not a field') | ||
rref_ddm, pivots = self.rep.rref() | ||
return self.from_rep(rref_ddm), tuple(pivots) | ||
|
||
def rref_den_gj_ff(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I would like to not to add them, especially it is controlled by method
s.
I'm not sure if you left it by intent, but it looks like we shouldn't add if rref
stuff is not actually using this.
I'd also want to respond to some comments here I don't think that It can slightly help for some experiments, but I think that it is better for users to know the matrix types, and approve them to cast it appropriately in practice. |
LGTM |
Oh, Okay. I'll rename them as private then and perhaps remove/simplify the docstrings. |
Choosing an algorithm is not the same as choosing a type though. The functions here will always return the same type e.g. dense if the input was dense but the best algorithm might not be the dense one even if the input was dense. There can also be reasons for choosing an algorithm even if it is inefficient. There are two basic ways of providing an API for algorithms like this:
So far DomainMatrix mostly follows approach 1. This PR changes the This change means that the caller no longer controls which exact algorithm is used simply by choosing the type and calling the
I have implemented both of these options in this PR but your two comments above have objected to both of them. Given that we do want to have an |
I think that it's better to decide only one than shipping everything. It confuses the users, if there are duplicate ways to do one thing. If there are multiple public facing methods in there, it just means that we have to write down duplicate documentation or test cases for things like |
In the last commit I renamed I also removed the redundant methods |
I think I've done what you intended here but this statement is a little ambiguous. I have made If you mean that |
References to other Issues or PRs
Makes use of fraction-free routines as discussed in gh-21834.
This based on the benchmark measurements in gh-25410.
This is built on top of the PR gh-25395 (which I had hoped could be merged before opening this PR). The sparse version of
rref_den
added there is always faster than the dense implementation added in gh-25346 and should be the main workhorse for most RREF operations.Previous related PRs are gh-25346, gh-25369, gh-25367 and gh-25336.
Brief description of what is fixed or changed
This PR changes
Matrix.rref
so that for a matrix of only integers or rational numbers it will useDomainMatrix.rref
orDomainMatrix.rref_den
which should be entirely equivalent but a lot faster in all cases.The
DomainMatrix.rref
method itself was also made faster for the ZZ and QQ domains by using empirically determined rules for deciding whch algorithm to use. For ZZ there are two extremes in which different algorithms are preferred: large dense matrices with large integers should use fraction-free Gauss-Jordan (GJFF) elimination over ZZ whereas very sparse matrices with small integers should use Gauss-Jordan with division (GJDIV) over QQ. For QQ the primary consideration is the nature of the denominators: small denominators should just be cleared to use GJFF over ZZ whereas very large denominators should use GJDIV over QQ. In between there are some cases where GJFF over QQ is best.RREF is the main component of a number of matrix operations like
Matrix.inv
,Matrix.solve
,Matrix.nullspace
. In turn nullspace is used byeigenvects
andjordan_normal_form
buteigenvects
was already previously made a lot faster by having it use DomainMatrix directly (it would benefit from using the new fraction-free routines but I haven't made any change to that here). Alsolinsolve
usesDomainMatrix.rref
directly and should be changed to userref_den
instead but I haven't changed that here. Also rref is used as part of therisch
andheurisch
integration routines although these already useDomainMatrix
more directly: they benefit from the improvements to DomainMatrix.rref but not the fact that Matrix.rref now uses DomainMatrix.rref.Here are some timings to illustrate the effects of this. With master the the time to invert a matrix of integers (values from 1 to 100) is:
We can see that as we go from 16x16 to 17x17 and so on each time
n
increases by 1 the runtime increases by a factor of about 3 meaning that the runtime scales exponentially like3^n
. We can guess that the time to invert a 20x20 matrix would be something like 15 minutes but I haven't waited that long to test that.With this PR the equivalent timings are (note that I have gmpy2 installed which affects this):
So this is 2x faster for a 2x2 matrix but that scales up to about 10000x faster for a 19x19 and probably a much bigger ration at 200x200 but no one would wait long enough to see timings for 200x200 using the existing
Matrix.rref
routine. These timings seem to scale somewhere in betweenn^3
andn^4
. I believe that asymptotically for extremely large matrices of extremely large integers this algorithm here should beO(n^5)
if asymptotically optimal integer multiplication is used.Note that CPython does not have asymptotically optimal multiplication whereas gmpy2 does. I think that the reasons that the existing
Matrix.rref
method are so slow as n scales up are that:n
even if asymptotically optimal multiplication is used.Rational
rather thanMPQ
. The pure Python version ofMPQ
has better arithmetic thanRational
but whengmpy2
is installed itsmpq
is much faster for smaller rationals but asymptotically much faster for large rationals.Another approach that could be used is to change
Rational
(andInteger
) to useMPQ
andMPZ
so that they can use gmpy2 when it is installed and also can take advantage ofPythonMPQ
's faster arithmetic even when gmpy2 is not installed. This would speed up many things but RREF would still be slow because of the division-free approach that is used. Another possibility is thatMatrix.rref
could check to see if the matrix is all integers and then use fraction-free RREF but that is exactly whatDomainMatrix
does for us: identify an appropriate domain and then use the best algorithms for that domain.Other comments
The approach here is to speed up
rref
forMatrix
by checking to see if the matrix is all rational numbers or all integers. This is usually a very quick operation since the internal_rep
attribute of aMatrix
is already aDomainMatrix
and is usually already overZZ
orQQ
if the matrix is all rationals. Since matrices are mutable though it is possible that the internal_rep
is overEXRAW
even though its entries are all rational so we check all entries in that case. This should still be a small cost compared to the time taken to compute the actualRREF
which is a heavy operation.This approach could be extended to other simple domains like
RR
,CC
,ZZ_I
,QQ_I
,QQ<a>
. In the case ofQQ<a>
we need to be careful thatfinding a primitive element can be slower than the rref itself so there should be some limits.
Other more interesting domains are polynomial rings and function fields like
QQ[x,y]
orQQ(x,y)
. For these kinds of domainsrref
is fundamentally not right though andrref_den
with clearing denominators should be used so the way to speed these up is to change methods likeinv
so that they do not attempt to userref
rather than speeding uprref
itself.This also sets the scene to be able to make use of
python_flint
to speed up matrix operations: once we know we are performing an operation likerref
overQQ
then the corresponding flint routines could be used which are a lot faster for dense matrices. Here are comparable timings withpython_flint
:Broadly this is about 5x faster than DomainMatrix although I think that the difference widens as you go up to extremely large matrices because flint uses asymptotically optimal matrix multiplication and for large matrices delegates tasks like inverse to matrix multiplication. What these timings show though is that the pure Python implementation of
DomainMatrix
is not far from the performance of a highly optimised C implementation.Here are comparable timings using
symengine
which has a C++ implementation ofMatrix
. A symengine Matrix is like a sympy Matrix in that it allows arbitrary expressions in the matrix and uses its equivalent ofExpr
rather than choosing particular domains likepython_flint
orDomainMatrix
. This means that it is not as fast when there is a well defined domain as in this case but otherwise I am not sure what algorithm it uses forinv
orrref
:So SymEngine is faster for very small matrices but becomes slower than DomainMatrix as we scale up to very large matrices. My guess is that these differences are explained by the fact that at small sizes pure Python overhead slows down DomainMatrix but at larger sizes the effect of an asymptotically better algorithm (fraction-free Gauss Jordan) means that DomainMatrix is faster.
Release Notes
If gh-25395 is not merged before this then the release notes from there should be included here if this is merged.
Matrix.rref
now usesDomainMatrix.rref
when the matrix consists only of integers or rational numbers. This makes therref
operation a lot faster and speeds up many other matrix operations that userref
such as computing a matrix inverse, nullspace or solving a matrix equation. The exact timing difference depends on a number of factors and is larger for larger matrices. As an example computing the inverse of a dense 19x19 matrix of integers values between 0 and 100 is about 10000x faster. The speed difference is bigger if gmpy2 is installed since that speeds up elementary integer and rational arithmetic and will now be used byMatrix.rref
when possible.