-
-
Notifications
You must be signed in to change notification settings - Fork 656
Add "flint" factorization algorithm and replace qsieve implementation #35419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We support a few integer factoring algorithms, including PARI (the default), qsieve (via FlintQS), and ECM-GMP. Sage already includes the FLINT library, and FLINT has its own integer factoring routine. This commit adds algorithm="flint" as an option, e.g. sage: ZZ(100).factor(algorithm="flint") To accomplish this, fmpz_factor() was added to sage.libs.flint, and a wrapper factor_using_flint() was added to sage.rings.factorint. The new option was also added to the global factor() function.
The qsieve() function from sage.interfaces.qsieve uses an old, standalone copy of Bill Hart's quadratic sieve that was moved into its own package, called FlintQS. This implementation isn't terribly reliable by modern standards (just look at the docstring), and it requires maintenance for security issues and to appease newer compilers. However, in the intervening years since FlintQS was forked, a quadratic sieve was added to FLINT itself, as qsieve_factor(). This is a bit slower than FlintQS in general, but returns the right answer much more often. Instead of bringing FlintQS up to date, this commit reimplements sage's qsieve() with FLINT's qsieve_factor(). This entails adding qsieve_factor() to sage.libs.flint, and rewriting qsieve() to call it and process the output. The implementation is very similar to factor(n, algorithm="flint"). The output type of the function has necessarily changed, since we now provide not only the factors but their exponents as well.
Now that qsieve() is implemented with the FLINT library, the standalone QuadraticSieve program provided by the FlintQS package is not needed.
We no longer have a separate "interface" for qsieve(). It now calls a function from the FLINT library, and FLINT is already cited.
The integer factorization tutorial suggests that qsieve() is the best algorithm, that it is faster than PARI, and that sage's generic factor() should be rewritten to use qsieve() and ecm.factor(). We have toned down these claims: * PARI has gotten much faster and is competitive with qsieve() now that qsieve() is implemented with qsieve_factor() and returns fewer wildly incorrect answers. * I highly doubt that it's obvious which algorithm will work the best before you actually try them, making it very hard to improve on what PARI or FLINT are doing in a generic factor(). The (not tested) timings have also been removed so that we can test the output. Not only are the timings outdated and random, but they aren't even comparatively accurate anymore (for example, PARI doesn't take noticeably longer to factor the integer we give it than qsieve does). In any case, it's nice to check that the examples work.
This move reflects the recent reimplementation of qsieve() using the FLINT library. A deprecation warning has been added for the old name, and all uses within the sage library have been updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to clean that up!
- Instead of writing twice the same code, why don't you write a function to convert the
fmpz_factor_t
to the proper sageFactorization
object? - I do not understand why one of the factorization function ends up in
sage/rings/factorint.pyx
while the other is insage/libs/flint/qsieve.pyx
. To my mind, it would make more sense to keep these two together. One possible option is to make them methods of the sageInteger
object (as we do have_rank_linbox
,_rank_flint
on matrices). - Note that there is a third factorization from flint :
n_factor_to_list
fromsage/libs/flint/ulong_extras.pyx
src/sage/libs/flint/qsieve.pyx
Outdated
[(10000000000000000051, 1), (100000000000000000039, 1)] | ||
|
||
""" | ||
if n.is_zero(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this line you already assume that n
is of type Integer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were a few mistakes here. I've updated that function to call n = Integer(n)
first, and then afterwards, it raises an ArithmeticError
if n.is_zero()
. That's what the other user-facing factorization functions do.
from sage.libs.gmp.mpz cimport mpz_t, mpz_init, mpz_set, mpz_clear | ||
from sage.rings.integer cimport Integer | ||
|
||
def qsieve(n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not def qsieve(Integer n)
?
And I suggest to use factor_qsieve
rather than qsieve
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qsieve()
is in the global namespace and is documented in a few places, so I was reluctant to break too many things. The old implementation called n = ZZ(n)
at the start, so, for example, you could pass it a rational number (with denominator 1) and it would factor it for you. I left the name alone for similar reasons.
If it were entirely up to me, I would remove qsieve
from the global namespace, leaving only n.factor(algorithm="qsieve")
and factor(n, algorithm="qsieve")
as the official interfaces to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not ideal because things like qsieve('5')
or qsieve(5.0)
would then work. It is fine to leave it but not a great interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No doubt, having qsieve()
in the global scope at all is an anachronism today. It should be an internal function, with the blessed interface to it being ZZ(x).factor(algorithm="qsieve")
. I'll open another issue for this.
src/sage/libs/flint/qsieve.pyx
Outdated
|
||
cdef fmpz_factor_t factors | ||
fmpz_factor_init(factors) | ||
qsieve_factor(factors,p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after ',' is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which linter is complaining about this and the others? I can run them myself to save you some trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/sagemath/sage/actions/runs/4588779402/jobs/8103233381?pr=35419
(though not all code style seems activated and the one above is not seen)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, they should all be fixed. I can't resolve the warnings that complain about cython constructs like <foo>n
and &var
however.
src/sage/libs/flint/qsieve.pyx
Outdated
mpz_set(f.value, mpz_factor) | ||
mpz_clear(mpz_factor) | ||
e = int(factors.exp[i]) | ||
pairs.append( (f,e) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after ,
is needed. And spaces around '(' should be removed (otherwise linter complains)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/sage/rings/factorint.pyx
Outdated
|
||
cdef fmpz_factor_t factors | ||
fmpz_factor_init(factors) | ||
fmpz_factor(factors,p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after ,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/sage/rings/factorint.pyx
Outdated
pairs = [] | ||
if factors.sign < 0: | ||
# FLINT doesn't return the plus/minus one factor. | ||
pairs.append( (Integer(-1), int(1)) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no spaces around (
and )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/sage/rings/factorint.pyx
Outdated
mpz_set(f.value, mpz_factor) | ||
mpz_clear(mpz_factor) | ||
e = int(factors.exp[i]) | ||
pairs.append( (f,e) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after ,
and no spaces around (
and )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/sage/libs/flint/qsieve.pyx
Outdated
cdef mpz_t mpz_factor; | ||
for i in range(factors.num): | ||
mpz_init(mpz_factor) | ||
fmpz_get_mpz(mpz_factor, &factors.p[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why you need mpz_factor
here at all? The value of the corresponding factors.p[i]
could directly be set to f.value
below without relying on the intermediate mpz_factor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not crash due to the underlying representations being compatible, but somehow we need to get a GMP integer from the FLINT integer, and AFAIK fmpz_get_mpz()
is the way to do it. Looking at the FLINT source, it's doing something nontrivial:
void
fmpz_get_mpz(mpz_t x, const fmpz_t f)
{
if (!COEFF_IS_MPZ(*f))
flint_mpz_set_si(x, *f); /* set x to small value */
else
mpz_set(x, COEFF_TO_PTR(*f)); /* set x to large value */
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I tried to suggest was
- cdef mpz_t mpz_factor
for i in range(factors.num):
- mpz_init(mpz_factor)
- fmpz_get_mpz(mpz_factor, &factors.p[i])
f = Integer()
- mpz_set(f.value, mpz_factor)
- mpz_clear(mpz_factor)
+ fmpz_get_mpz(f.value, &factors.p[i])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right, obviously :)
It works, thanks for being patient.
src/sage/libs/flint/qsieve.pyx
Outdated
pairs = [] | ||
if factors.sign < 0: | ||
# FLINT doesn't return the plus/minus one factor. | ||
pairs.append( (Integer(-1), int(1)) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No space around (
(for the linter to be happy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/sage/libs/flint/qsieve.pyx
Outdated
|
||
n = Integer(n) | ||
|
||
sig_on() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not put declarations such as cdef fmpz_t p
inside sig_on()
/sig_off()
. It makes it harder to parse what is between these. More dramatically, I think that the presence of a cast operation <Integer> n
between sig_on()
/sig_off()
is a source of segmentation fault (as it could raise a TypeError
). The latter would be solved by properly typing the function signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've moved them down to surround only the long-running factorization functions. I've also added Integer n
to the signature of factor_using_flint
.
To match the behavior of ZZ.zero().factor(), we make qsieve(0) raise an ArithmeticError instead of returning 0^1.
…nt(). This is an internal function, so we might as well ensure that the argument is the type that we expect it to be.
This makes it more obvious where the long computation takes place.
Both qsieve() and factor_using_flint() need to convert a fmpz_factor_t to a list of (factor, exponent) pairs. Here we move that functionality into a new C function, fmpz_factor_to_pairlist, under sage.libs.flint.
In response to your first comment:
I've done this, but it only converts to a list of pairs. The
The default I agree that it's messy, but the whole thing could probably use a refactoring to e.g. move |
There's no need for intermediate GMP integers in this function.
Thanks for doing this work. I am keeping an eye on this PR so I can consider orphaning or retiring the |
@@ -287,8 +287,7 @@ | |||
sage: q = next_prime(randrange(2^97)) | |||
sage: n = p * q | |||
sage: qsieve(n) # long time (8s on sage.math, 2011) | |||
([6340271405786663791648052309, | |||
46102313108592180286398757159], '') | |||
[(6340271405786663791648052309, 1), (46102313108592180286398757159, 1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to alert Bill about this change when the PR is merged.
FlintQS is no longer maintained and has several bugs. As of sagemath/sage#35419, all of FlintQS's functionality is contained within Sage.
I recently asked about FlintQS on sage-devel and was made a maintainer of https://github.com/sagemath/FlintQS. The original goal was to merge some pull requests that fix memory errors and C++ standards non-compliance. In the process I noticed that FlintQS also uses
/tmp
unsafely, with predictable filenames that can be hijacked.To me, much of the code in FlintQS looked like an early version of the
qsieve_factor()
function that eventually made it into FLINT itself. I asked Fredrik Johansson about this and he said we probably don't need FlintQS any more, subject to benchmarking, and also suggested that we add analgorithm="flint"
keyword for integer factorization to Sage, making use offmpz_factor()
in FLINT.Since they are closely related, this PR does those two things:
algorithm="flint"
for integer factorization, andqsieve_factor()
as the implementation ofqsieve()
in Sage.Benchmarks show that the new implementation is a bit slower, but returns correct answers much more reliably. The relevant docs have been updated, and the implementation was moved from
sage.interfaces.qsieve
tosage.libs.flint.qsieve
. A deprecation warning is raised for the old module.The output format has changed to include the exponents in the factorization, which is a breaking change, but there is already a big warning for
qsieve()
that says the answers might be completely wrong, so I don't think we're breaking a promise in this case.