Skip to content

Conversation

orlitzky
Copy link
Contributor

@orlitzky orlitzky commented Apr 2, 2023

I recently asked about FlintQS on sage-devel and was made a maintainer of https://github.com/sagemath/FlintQS. The original goal was to merge some pull requests that fix memory errors and C++ standards non-compliance. In the process I noticed that FlintQS also uses /tmp unsafely, with predictable filenames that can be hijacked.

To me, much of the code in FlintQS looked like an early version of the qsieve_factor() function that eventually made it into FLINT itself. I asked Fredrik Johansson about this and he said we probably don't need FlintQS any more, subject to benchmarking, and also suggested that we add an algorithm="flint" keyword for integer factorization to Sage, making use of fmpz_factor() in FLINT.

Since they are closely related, this PR does those two things:

  • Add algorithm="flint" for integer factorization, and
  • Replace FlintQS with FLINT's qsieve_factor() as the implementation of qsieve() in Sage.

Benchmarks show that the new implementation is a bit slower, but returns correct answers much more reliably. The relevant docs have been updated, and the implementation was moved from sage.interfaces.qsieve to sage.libs.flint.qsieve. A deprecation warning is raised for the old module.

The output format has changed to include the exponents in the factorization, which is a breaking change, but there is already a big warning for qsieve() that says the answers might be completely wrong, so I don't think we're breaking a promise in this case.

orlitzky added 7 commits April 2, 2023 08:31
We support a few integer factoring algorithms, including PARI (the
default), qsieve (via FlintQS), and ECM-GMP. Sage already includes the
FLINT library, and FLINT has its own integer factoring routine. This
commit adds algorithm="flint" as an option, e.g.

  sage: ZZ(100).factor(algorithm="flint")

To accomplish this, fmpz_factor() was added to sage.libs.flint, and a
wrapper factor_using_flint() was added to sage.rings.factorint. The
new option was also added to the global factor() function.
The qsieve() function from sage.interfaces.qsieve uses an old,
standalone copy of Bill Hart's quadratic sieve that was moved into its
own package, called FlintQS. This implementation isn't terribly
reliable by modern standards (just look at the docstring), and it
requires maintenance for security issues and to appease newer
compilers.

However, in the intervening years since FlintQS was forked, a
quadratic sieve was added to FLINT itself, as qsieve_factor(). This is
a bit slower than FlintQS in general, but returns the right answer
much more often. Instead of bringing FlintQS up to date, this commit
reimplements sage's qsieve() with FLINT's qsieve_factor().

This entails adding qsieve_factor() to sage.libs.flint, and rewriting
qsieve() to call it and process the output. The implementation is very
similar to factor(n, algorithm="flint").

The output type of the function has necessarily changed, since we now
provide not only the factors but their exponents as well.
Now that qsieve() is implemented with the FLINT library, the
standalone QuadraticSieve program provided by the FlintQS package is
not needed.
We no longer have a separate "interface" for qsieve(). It now calls a
function from the FLINT library, and FLINT is already cited.
The integer factorization tutorial suggests that qsieve() is the best
algorithm, that it is faster than PARI, and that sage's generic
factor() should be rewritten to use qsieve() and ecm.factor(). We have
toned down these claims:

  * PARI has gotten much faster and is competitive with qsieve()
    now that qsieve() is implemented with qsieve_factor() and
    returns fewer wildly incorrect answers.

  * I highly doubt that it's obvious which algorithm will work the
    best before you actually try them, making it very hard to improve
    on what PARI or FLINT are doing in a generic factor().

The (not tested) timings have also been removed so that we can test
the output. Not only are the timings outdated and random, but they
aren't even comparatively accurate anymore (for example, PARI doesn't
take noticeably longer to factor the integer we give it than qsieve
does). In any case, it's nice to check that the examples work.
This move reflects the recent reimplementation of qsieve() using the
FLINT library. A deprecation warning has been added for the old name,
and all uses within the sage library have been updated.
Copy link
Contributor

@videlec videlec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to clean that up!

  • Instead of writing twice the same code, why don't you write a function to convert the fmpz_factor_t to the proper sage Factorization object?
  • I do not understand why one of the factorization function ends up in sage/rings/factorint.pyx while the other is in sage/libs/flint/qsieve.pyx. To my mind, it would make more sense to keep these two together. One possible option is to make them methods of the sage Integer object (as we do have _rank_linbox, _rank_flint on matrices).
  • Note that there is a third factorization from flint : n_factor_to_list from sage/libs/flint/ulong_extras.pyx

[(10000000000000000051, 1), (100000000000000000039, 1)]

"""
if n.is_zero():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this line you already assume that n is of type Integer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a few mistakes here. I've updated that function to call n = Integer(n) first, and then afterwards, it raises an ArithmeticError if n.is_zero(). That's what the other user-facing factorization functions do.

from sage.libs.gmp.mpz cimport mpz_t, mpz_init, mpz_set, mpz_clear
from sage.rings.integer cimport Integer

def qsieve(n):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not def qsieve(Integer n)?

And I suggest to use factor_qsieve rather than qsieve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qsieve() is in the global namespace and is documented in a few places, so I was reluctant to break too many things. The old implementation called n = ZZ(n) at the start, so, for example, you could pass it a rational number (with denominator 1) and it would factor it for you. I left the name alone for similar reasons.

If it were entirely up to me, I would remove qsieve from the global namespace, leaving only n.factor(algorithm="qsieve") and factor(n, algorithm="qsieve") as the official interfaces to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not ideal because things like qsieve('5') or qsieve(5.0) would then work. It is fine to leave it but not a great interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No doubt, having qsieve() in the global scope at all is an anachronism today. It should be an internal function, with the blessed interface to it being ZZ(x).factor(algorithm="qsieve"). I'll open another issue for this.


cdef fmpz_factor_t factors
fmpz_factor_init(factors)
qsieve_factor(factors,p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after ',' is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which linter is complaining about this and the others? I can run them myself to save you some trouble.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/sagemath/sage/actions/runs/4588779402/jobs/8103233381?pr=35419
(though not all code style seems activated and the one above is not seen)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, they should all be fixed. I can't resolve the warnings that complain about cython constructs like <foo>n and &var however.

mpz_set(f.value, mpz_factor)
mpz_clear(mpz_factor)
e = int(factors.exp[i])
pairs.append( (f,e) )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after , is needed. And spaces around '(' should be removed (otherwise linter complains)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


cdef fmpz_factor_t factors
fmpz_factor_init(factors)
fmpz_factor(factors,p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after ,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pairs = []
if factors.sign < 0:
# FLINT doesn't return the plus/minus one factor.
pairs.append( (Integer(-1), int(1)) )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no spaces around ( and )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

mpz_set(f.value, mpz_factor)
mpz_clear(mpz_factor)
e = int(factors.exp[i])
pairs.append( (f,e) )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after , and no spaces around ( and )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cdef mpz_t mpz_factor;
for i in range(factors.num):
mpz_init(mpz_factor)
fmpz_get_mpz(mpz_factor, &factors.p[i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why you need mpz_factor here at all? The value of the corresponding factors.p[i] could directly be set to f.value below without relying on the intermediate mpz_factor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not crash due to the underlying representations being compatible, but somehow we need to get a GMP integer from the FLINT integer, and AFAIK fmpz_get_mpz() is the way to do it. Looking at the FLINT source, it's doing something nontrivial:

void
fmpz_get_mpz(mpz_t x, const fmpz_t f)
{
    if (!COEFF_IS_MPZ(*f))
        flint_mpz_set_si(x, *f);      /* set x to small value */
    else
        mpz_set(x, COEFF_TO_PTR(*f));   /* set x to large value */
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I tried to suggest was

-    cdef mpz_t mpz_factor
    for i in range(factors.num):
-        mpz_init(mpz_factor)
-        fmpz_get_mpz(mpz_factor, &factors.p[i])
        f = Integer()
-        mpz_set(f.value, mpz_factor)
-        mpz_clear(mpz_factor)
+        fmpz_get_mpz(f.value, &factors.p[i])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, right, obviously :)

It works, thanks for being patient.

pairs = []
if factors.sign < 0:
# FLINT doesn't return the plus/minus one factor.
pairs.append( (Integer(-1), int(1)) )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No space around ( (for the linter to be happy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


n = Integer(n)

sig_on()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not put declarations such as cdef fmpz_t p inside sig_on()/sig_off(). It makes it harder to parse what is between these. More dramatically, I think that the presence of a cast operation <Integer> n between sig_on()/sig_off() is a source of segmentation fault (as it could raise a TypeError). The latter would be solved by properly typing the function signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've moved them down to surround only the long-running factorization functions. I've also added Integer n to the signature of factor_using_flint.

orlitzky added 5 commits April 2, 2023 19:58
To match the behavior of ZZ.zero().factor(), we make qsieve(0) raise
an ArithmeticError instead of returning 0^1.
…nt().

This is an internal function, so we might as well ensure that the
argument is the type that we expect it to be.
This makes it more obvious where the long computation takes place.
Both qsieve() and factor_using_flint() need to convert a fmpz_factor_t
to a list of (factor, exponent) pairs. Here we move that functionality
into a new C function, fmpz_factor_to_pairlist, under sage.libs.flint.
@orlitzky
Copy link
Contributor Author

orlitzky commented Apr 3, 2023

In response to your first comment:

Instead of writing twice the same code, why don't you write a function to convert the fmpz_factor_t to the proper sage Factorization object?

I've done this, but it only converts to a list of pairs. The factor() method then converts the list to a Factorization. This allows the behavior of factor_using_flint to mirror factor_using_pari, and doesn't change the output type of qsieve from a list to a Factorization (keeping in mind that qsieve is documented and in the global namespace).

I do not understand why one of the factorization function ends up in sage/rings/factorint.pyx while the other is in sage/libs/flint/qsieve.pyx

The default n.factor() is implemented using factor_using_pari from sage.rings.factorint, and for algorithm="flint", I simply copied that. The qsieve function already lived in its own separate module. I moved the module from interfaces to libs/flint, but otherwise didn't restructure.

I agree that it's messy, but the whole thing could probably use a refactoring to e.g. move sage.rings.factorint into the integer class.

There's no need for intermediate GMP integers in this function.
@musicinmybrain
Copy link

musicinmybrain commented Apr 7, 2023

Thanks for doing this work. I am keeping an eye on this PR so I can consider orphaning or retiring the flintqs CLI package in Fedora Linux once the packaged version of sagemath no longer requires it.

@@ -287,8 +287,7 @@
sage: q = next_prime(randrange(2^97))
sage: n = p * q
sage: qsieve(n) # long time (8s on sage.math, 2011)
([6340271405786663791648052309,
46102313108592180286398757159], '')
[(6340271405786663791648052309, 1), (46102313108592180286398757159, 1)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to alert Bill about this change when the PR is merged.

@orlitzky orlitzky mentioned this pull request Apr 12, 2023
@vbraun vbraun merged commit 820909d into sagemath:develop Apr 13, 2023
collares pushed a commit to NixOS/nixpkgs that referenced this pull request Mar 26, 2024
FlintQS is no longer maintained and has several bugs.
As of sagemath/sage#35419, all of FlintQS's
functionality is contained within Sage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants