-
Notifications
You must be signed in to change notification settings - Fork 194
Description
Quote from libmpf.py
:
We don't pickle tuples directly for the following reasons:
1: pickle uses str() for ints, which is inefficient when they are large
2: pickle doesn't work for gmpy mpzs
Both problems are solved by using hex()
It seems, now gmpy2 does support pickle, so 2) is gone. Regarding 1) - it also doesn't seems to be true, take an example benchmark:
$ cat bench.py
import pickle
import gmpy2
with open('ai.dat', "bw") as f:
for a in range(1000000):
pickle.dump(a, f)
with open('as.dat', "bw") as f:
for a in range(1000000):
pickle.dump(str(a), f)
with open('ah.dat', "bw") as f:
for a in range(1000000):
pickle.dump(hex(a)[2:], f)
with open('ag.dat', "bw") as f:
for a in range(1000000):
pickle.dump(gmpy2.mpz(a), f)
big, step = int(10e30), 20
with open('bi.dat', "bw") as f:
for a in range(big, big+step):
pickle.dump(a, f)
with open('bs.dat', "bw") as f:
for a in range(big, big+step):
pickle.dump(str(a), f)
with open('bh.dat', "bw") as f:
for a in range(big, big+step):
pickle.dump(hex(a)[2:], f)
with open('bg.dat', "bw") as f:
for a in range(big, big+step):
pickle.dump(gmpy2.mpz(a), f)
$ python bench.py
$ ls -l *.dat
-rw-r--r-- 1 sk sk 43M фев 16 13:50 ag.dat
-rw-r--r-- 1 sk sk 15M фев 16 13:50 ah.dat
-rw-r--r-- 1 sk sk 7,6M фев 16 13:50 ai.dat
-rw-r--r-- 1 sk sk 16M фев 16 13:50 as.dat
-rw-r--r-- 1 sk sk 1,1K фев 16 13:50 bg.dat
-rw-r--r-- 1 sk sk 720 фев 16 13:50 bh.dat
-rw-r--r-- 1 sk sk 360 фев 16 13:50 bi.dat
-rw-r--r-- 1 sk sk 820 фев 16 13:50 bs.dat
So, it seems, the most efficient in size dump is with plain int's and as a first step I suggest replacing using hex() with int(). In this way, Sage's case will not be a special.
But in a long term, I think it would be better to drop any special workarounds for pickle support. Huge dumps for pickled mpz's may require some investigation (perhaps, there are some speed/size tradeoff?), at first sight it seems to be a bug for me. But using plain mpz's also will be better than using str/hex repr already now, for large inputs, as here sizes seems to be asymptotically same for int vs mpz's (and int is better than str/hex anyway):
$ cat bench2.py
import sys
import pickle
import gmpy2
big, step = 1<<int(sys.argv[1]), 20
with open('i.dat', "bw") as f:
for a in range(big, big+step):
pickle.dump(a, f)
with open('g.dat', "bw") as f:
for a in range(big, big+step):
pickle.dump(gmpy2.mpz(a), f)
$ python bench2.py 100; ls -l *.dat
-rw-r--r-- 1 sk sk 1,1K фев 16 14:12 g.dat
-rw-r--r-- 1 sk sk 360 фев 16 14:12 i.dat
$ python bench2.py 1000; ls -l *.dat
-rw-r--r-- 1 sk sk 3,3K фев 16 14:12 g.dat
-rw-r--r-- 1 sk sk 2,6K фев 16 14:12 i.dat
$ python bench2.py 10000000; ls -l *.dat
-rw-r--r-- 1 sk sk 24M фев 16 14:12 g.dat
-rw-r--r-- 1 sk sk 24M фев 16 14:12 i.dat
(Tested on CPython 3.7.1 with gmpy2 2.1.0a4.)