Add "flint" factorization algorithm and replace qsieve implementation #35419

orlitzky · 2023-04-02T12:49:12Z

I recently asked about FlintQS on sage-devel and was made a maintainer of https://github.com/sagemath/FlintQS. The original goal was to merge some pull requests that fix memory errors and C++ standards non-compliance. In the process I noticed that FlintQS also uses /tmp unsafely, with predictable filenames that can be hijacked.

To me, much of the code in FlintQS looked like an early version of the qsieve_factor() function that eventually made it into FLINT itself. I asked Fredrik Johansson about this and he said we probably don't need FlintQS any more, subject to benchmarking, and also suggested that we add an algorithm="flint" keyword for integer factorization to Sage, making use of fmpz_factor() in FLINT.

Since they are closely related, this PR does those two things:

Add algorithm="flint" for integer factorization, and
Replace FlintQS with FLINT's qsieve_factor() as the implementation of qsieve() in Sage.

Benchmarks show that the new implementation is a bit slower, but returns correct answers much more reliably. The relevant docs have been updated, and the implementation was moved from sage.interfaces.qsieve to sage.libs.flint.qsieve. A deprecation warning is raised for the old module.

The output format has changed to include the exponents in the factorization, which is a breaking change, but there is already a big warning for qsieve() that says the answers might be completely wrong, so I don't think we're breaking a promise in this case.

We support a few integer factoring algorithms, including PARI (the default), qsieve (via FlintQS), and ECM-GMP. Sage already includes the FLINT library, and FLINT has its own integer factoring routine. This commit adds algorithm="flint" as an option, e.g. sage: ZZ(100).factor(algorithm="flint") To accomplish this, fmpz_factor() was added to sage.libs.flint, and a wrapper factor_using_flint() was added to sage.rings.factorint. The new option was also added to the global factor() function.

The qsieve() function from sage.interfaces.qsieve uses an old, standalone copy of Bill Hart's quadratic sieve that was moved into its own package, called FlintQS. This implementation isn't terribly reliable by modern standards (just look at the docstring), and it requires maintenance for security issues and to appease newer compilers. However, in the intervening years since FlintQS was forked, a quadratic sieve was added to FLINT itself, as qsieve_factor(). This is a bit slower than FlintQS in general, but returns the right answer much more often. Instead of bringing FlintQS up to date, this commit reimplements sage's qsieve() with FLINT's qsieve_factor(). This entails adding qsieve_factor() to sage.libs.flint, and rewriting qsieve() to call it and process the output. The implementation is very similar to factor(n, algorithm="flint"). The output type of the function has necessarily changed, since we now provide not only the factors but their exponents as well.

Now that qsieve() is implemented with the FLINT library, the standalone QuadraticSieve program provided by the FlintQS package is not needed.

We no longer have a separate "interface" for qsieve(). It now calls a function from the FLINT library, and FLINT is already cited.

The integer factorization tutorial suggests that qsieve() is the best algorithm, that it is faster than PARI, and that sage's generic factor() should be rewritten to use qsieve() and ecm.factor(). We have toned down these claims: * PARI has gotten much faster and is competitive with qsieve() now that qsieve() is implemented with qsieve_factor() and returns fewer wildly incorrect answers. * I highly doubt that it's obvious which algorithm will work the best before you actually try them, making it very hard to improve on what PARI or FLINT are doing in a generic factor(). The (not tested) timings have also been removed so that we can test the output. Not only are the timings outdated and random, but they aren't even comparatively accurate anymore (for example, PARI doesn't take noticeably longer to factor the integer we give it than qsieve does). In any case, it's nice to check that the examples work.

This move reflects the recent reimplementation of qsieve() using the FLINT library. A deprecation warning has been added for the old name, and all uses within the sage library have been updated.

videlec

Nice to clean that up!

Instead of writing twice the same code, why don't you write a function to convert the fmpz_factor_t to the proper sage Factorization object?
I do not understand why one of the factorization function ends up in sage/rings/factorint.pyx while the other is in sage/libs/flint/qsieve.pyx. To my mind, it would make more sense to keep these two together. One possible option is to make them methods of the sage Integer object (as we do have _rank_linbox, _rank_flint on matrices).
Note that there is a third factorization from flint : n_factor_to_list from sage/libs/flint/ulong_extras.pyx

videlec · 2023-04-02T14:10:38Z

src/sage/libs/flint/qsieve.pyx

+        [(10000000000000000051, 1), (100000000000000000039, 1)]
+
+    """
+    if n.is_zero():


In this line you already assume that n is of type Integer?

There were a few mistakes here. I've updated that function to call n = Integer(n) first, and then afterwards, it raises an ArithmeticError if n.is_zero(). That's what the other user-facing factorization functions do.

videlec · 2023-04-02T14:10:52Z

src/sage/libs/flint/qsieve.pyx

+from sage.libs.gmp.mpz cimport mpz_t, mpz_init, mpz_set, mpz_clear
+from sage.rings.integer cimport Integer
+
+def qsieve(n):


Why not def qsieve(Integer n)?

And I suggest to use factor_qsieve rather than qsieve.

qsieve() is in the global namespace and is documented in a few places, so I was reluctant to break too many things. The old implementation called n = ZZ(n) at the start, so, for example, you could pass it a rational number (with denominator 1) and it would factor it for you. I left the name alone for similar reasons.

If it were entirely up to me, I would remove qsieve from the global namespace, leaving only n.factor(algorithm="qsieve") and factor(n, algorithm="qsieve") as the official interfaces to it.

This is not ideal because things like qsieve('5') or qsieve(5.0) would then work. It is fine to leave it but not a great interface.

No doubt, having qsieve() in the global scope at all is an anachronism today. It should be an internal function, with the blessed interface to it being ZZ(x).factor(algorithm="qsieve"). I'll open another issue for this.

videlec · 2023-04-02T14:11:42Z

src/sage/libs/flint/qsieve.pyx

+
+    cdef fmpz_factor_t factors
+    fmpz_factor_init(factors)
+    qsieve_factor(factors,p)


space after ',' is needed

Which linter is complaining about this and the others? I can run them myself to save you some trouble.

https://github.com/sagemath/sage/actions/runs/4588779402/jobs/8103233381?pr=35419
(though not all code style seems activated and the one above is not seen)

Ok, they should all be fixed. I can't resolve the warnings that complain about cython constructs like <foo>n and &var however.

videlec · 2023-04-02T14:12:09Z

src/sage/libs/flint/qsieve.pyx

+        mpz_set(f.value, mpz_factor)
+        mpz_clear(mpz_factor)
+        e = int(factors.exp[i])
+        pairs.append( (f,e) )


space after , is needed. And spaces around '(' should be removed (otherwise linter complains)

videlec · 2023-04-02T14:13:33Z

src/sage/rings/factorint.pyx

+
+    cdef fmpz_factor_t factors
+    fmpz_factor_init(factors)
+    fmpz_factor(factors,p)


space after ,

videlec · 2023-04-02T14:13:51Z

src/sage/rings/factorint.pyx

+    pairs = []
+    if factors.sign < 0:
+        # FLINT doesn't return the plus/minus one factor.
+        pairs.append( (Integer(-1), int(1)) )


no spaces around ( and )

videlec · 2023-04-02T14:14:11Z

src/sage/rings/factorint.pyx

+        mpz_set(f.value, mpz_factor)
+        mpz_clear(mpz_factor)
+        e = int(factors.exp[i])
+        pairs.append( (f,e) )


space after , and no spaces around ( and )

videlec · 2023-04-02T14:16:32Z

src/sage/libs/flint/qsieve.pyx

+    cdef mpz_t mpz_factor;
+    for i in range(factors.num):
+        mpz_init(mpz_factor)
+        fmpz_get_mpz(mpz_factor, &factors.p[i])


I do not understand why you need mpz_factor here at all? The value of the corresponding factors.p[i] could directly be set to f.value below without relying on the intermediate mpz_factor.

It might not crash due to the underlying representations being compatible, but somehow we need to get a GMP integer from the FLINT integer, and AFAIK fmpz_get_mpz() is the way to do it. Looking at the FLINT source, it's doing something nontrivial:

void fmpz_get_mpz(mpz_t x, const fmpz_t f) { if (!COEFF_IS_MPZ(*f)) flint_mpz_set_si(x, *f); /* set x to small value */ else mpz_set(x, COEFF_TO_PTR(*f)); /* set x to large value */ }

What I tried to suggest was

- cdef mpz_t mpz_factor for i in range(factors.num): - mpz_init(mpz_factor) - fmpz_get_mpz(mpz_factor, &factors.p[i]) f = Integer() - mpz_set(f.value, mpz_factor) - mpz_clear(mpz_factor) + fmpz_get_mpz(f.value, &factors.p[i])

Yes, right, obviously :)

It works, thanks for being patient.

videlec · 2023-04-02T14:19:10Z

src/sage/libs/flint/qsieve.pyx

+    pairs = []
+    if factors.sign < 0:
+        # FLINT doesn't return the plus/minus one factor.
+        pairs.append( (Integer(-1), int(1)) )


No space around ( (for the linter to be happy)

videlec · 2023-04-02T14:23:19Z

src/sage/libs/flint/qsieve.pyx

+
+    n = Integer(n)
+
+    sig_on()


I would not put declarations such as cdef fmpz_t p inside sig_on()/sig_off(). It makes it harder to parse what is between these. More dramatically, I think that the presence of a cast operation <Integer> n between sig_on()/sig_off() is a source of segmentation fault (as it could raise a TypeError). The latter would be solved by properly typing the function signature.

Ok, I've moved them down to surround only the long-running factorization functions. I've also added Integer n to the signature of factor_using_flint.

To match the behavior of ZZ.zero().factor(), we make qsieve(0) raise an ArithmeticError instead of returning 0^1.

…nt(). This is an internal function, so we might as well ensure that the argument is the type that we expect it to be.

This makes it more obvious where the long computation takes place.

Both qsieve() and factor_using_flint() need to convert a fmpz_factor_t to a list of (factor, exponent) pairs. Here we move that functionality into a new C function, fmpz_factor_to_pairlist, under sage.libs.flint.

orlitzky · 2023-04-03T13:19:28Z

In response to your first comment:

Instead of writing twice the same code, why don't you write a function to convert the fmpz_factor_t to the proper sage Factorization object?

I've done this, but it only converts to a list of pairs. The factor() method then converts the list to a Factorization. This allows the behavior of factor_using_flint to mirror factor_using_pari, and doesn't change the output type of qsieve from a list to a Factorization (keeping in mind that qsieve is documented and in the global namespace).

I do not understand why one of the factorization function ends up in sage/rings/factorint.pyx while the other is in sage/libs/flint/qsieve.pyx

The default n.factor() is implemented using factor_using_pari from sage.rings.factorint, and for algorithm="flint", I simply copied that. The qsieve function already lived in its own separate module. I moved the module from interfaces to libs/flint, but otherwise didn't restructure.

I agree that it's messy, but the whole thing could probably use a refactoring to e.g. move sage.rings.factorint into the integer class.

There's no need for intermediate GMP integers in this function.

musicinmybrain · 2023-04-07T16:53:51Z

Thanks for doing this work. I am keeping an eye on this PR so I can consider orphaning or retiring the flintqs CLI package in Fedora Linux once the packaged version of sagemath no longer requires it.

videlec · 2023-04-12T07:51:28Z

src/sage/tests/book_stein_ent.py

@@ -287,8 +287,7 @@
 sage: q = next_prime(randrange(2^97))
 sage: n = p * q
 sage: qsieve(n)  # long time (8s on sage.math, 2011)
-([6340271405786663791648052309,
-  46102313108592180286398757159], '')
+[(6340271405786663791648052309, 1), (46102313108592180286398757159, 1)]


You will need to alert Bill about this change when the PR is merged.

FlintQS is no longer maintained and has several bugs. As of sagemath/sage#35419, all of FlintQS's functionality is contained within Sage.

orlitzky added 7 commits April 2, 2023 08:31

Drop obsolete FlintQS package.

bc06aad

Now that qsieve() is implemented with the FLINT library, the standalone QuadraticSieve program provided by the FlintQS package is not needed.

src/sage/misc/citation.pyx: drop qsieve citation.

1f06bd2

We no longer have a separate "interface" for qsieve(). It now calls a function from the FLINT library, and FLINT is already cited.

src/sage/tests/book_stein_ent.py: update qsieve() output.

efbe6b1

Move qsieve() from sage.interfaces to sage.libs.flint.

cfe19b4

This move reflects the recent reimplementation of qsieve() using the FLINT library. A deprecation warning has been added for the old name, and all uses within the sage library have been updated.

videlec suggested changes Apr 2, 2023

View reviewed changes

orlitzky added 2 commits April 2, 2023 12:19

src/sage/rings/factorint.pyx: fix pycodestyle warnings.

73a0aa3

src/sage/libs/flint/qsieve.pyx: fix pycodestyle warnings.

810817b

videlec added c: interfaces s: needs work labels Apr 2, 2023

orlitzky added 5 commits April 2, 2023 19:58

src/sage/libs/flint/qsieve.pyx: zero shouldn't have a factorization.

378aa65

To match the behavior of ZZ.zero().factor(), we make qsieve(0) raise an ArithmeticError instead of returning 0^1.

src/sage/rings/factorint.pyx: require Integer arg in factor_using_fli…

2da9dc0

…nt(). This is an internal function, so we might as well ensure that the argument is the type that we expect it to be.

Tighten sig_on/sig_off in FLINT factorization routines.

1e32c03

This makes it more obvious where the long computation takes place.

Factor out the code for FLINT -> python factorization conversions.

5bbaf2f

Both qsieve() and factor_using_flint() need to convert a fmpz_factor_t to a list of (factor, exponent) pairs. Here we move that functionality into a new C function, fmpz_factor_to_pairlist, under sage.libs.flint.

More pycodestyle tweaks in the new FLINT factorization modules.

7a70d48

This was referenced Apr 6, 2023

CVE-2023-29465: Unsafe TMPDIR usage sagemath/FlintQS#3

Open

Silence compiler warnings sagemath/FlintQS#2

Open

orlitzky added s: needs review and removed s: needs work labels Apr 6, 2023

src/sage/libs/flint/fmpz_factor.pyx: simplify fmpz_factor_to_pairlist().

9549887

There's no need for intermediate GMP integers in this function.

videlec reviewed Apr 12, 2023

View reviewed changes

videlec added s: positive review and removed s: needs review labels Apr 12, 2023

videlec approved these changes Apr 12, 2023

View reviewed changes

orlitzky mentioned this pull request Apr 12, 2023

Refactor factoring #35488

Open

vbraun merged commit 820909d into sagemath:develop Apr 13, 2023

collares pushed a commit to NixOS/nixpkgs that referenced this pull request Mar 26, 2024

sage: drop flintqs

c8d30c0

FlintQS is no longer maintained and has several bugs. As of sagemath/sage#35419, all of FlintQS's functionality is contained within Sage.

Uh oh!

Add "flint" factorization algorithm and replace qsieve implementation #35419

Add "flint" factorization algorithm and replace qsieve implementation #35419

Uh oh!

Conversation

orlitzky commented Apr 2, 2023

Uh oh!

videlec left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

orlitzky commented Apr 3, 2023

Uh oh!

musicinmybrain commented Apr 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

musicinmybrain commented Apr 7, 2023 •

edited

Loading