to_pickable/from_pickable may be obsoleted or may be more simple

Quote from `libmpf.py`:
```
We don't pickle tuples directly for the following reasons:
  1: pickle uses str() for ints, which is inefficient when they are large
  2: pickle doesn't work for gmpy mpzs
Both problems are solved by using hex()
```

It seems, now gmpy2 does support pickle, so 2) is gone.  Regarding 1) - it also doesn't seems to be true, take an example benchmark:
```
$ cat bench.py 
import pickle
import gmpy2

with open('ai.dat', "bw") as f:
    for a in range(1000000):
        pickle.dump(a, f)

with open('as.dat', "bw") as f:
    for a in range(1000000):
        pickle.dump(str(a), f)

with open('ah.dat', "bw") as f:
    for a in range(1000000):
        pickle.dump(hex(a)[2:], f)

with open('ag.dat', "bw") as f:
    for a in range(1000000):
        pickle.dump(gmpy2.mpz(a), f)

big, step = int(10e30), 20

with open('bi.dat', "bw") as f:
    for a in range(big, big+step):
        pickle.dump(a, f)

with open('bs.dat', "bw") as f:
    for a in range(big, big+step):
        pickle.dump(str(a), f)

with open('bh.dat', "bw") as f:
    for a in range(big, big+step):
        pickle.dump(hex(a)[2:], f)

with open('bg.dat', "bw") as f:
    for a in range(big, big+step):
        pickle.dump(gmpy2.mpz(a), f)
$ python bench.py
$ ls -l *.dat
-rw-r--r-- 1 sk sk  43M фев 16 13:50 ag.dat
-rw-r--r-- 1 sk sk  15M фев 16 13:50 ah.dat
-rw-r--r-- 1 sk sk 7,6M фев 16 13:50 ai.dat
-rw-r--r-- 1 sk sk  16M фев 16 13:50 as.dat
-rw-r--r-- 1 sk sk 1,1K фев 16 13:50 bg.dat
-rw-r--r-- 1 sk sk  720 фев 16 13:50 bh.dat
-rw-r--r-- 1 sk sk  360 фев 16 13:50 bi.dat
-rw-r--r-- 1 sk sk  820 фев 16 13:50 bs.dat
```

So, it seems, the most efficient in size dump is with plain int's and as a first step I suggest replacing using hex() with int().  In this way, Sage's case will not be a special.

But in a long term, I think it would be better to drop any special workarounds for pickle support.  Huge dumps for pickled mpz's may require some investigation (perhaps, there are some speed/size tradeoff?), at first sight it seems to be a bug for me.  But using plain mpz's also will be better than using str/hex repr already now, for large inputs, as here sizes seems to be asymptotically same for int vs mpz's (and int is better than str/hex anyway):
```
$ cat bench2.py
import sys
import pickle
import gmpy2

big, step = 1<<int(sys.argv[1]), 20

with open('i.dat', "bw") as f:
    for a in range(big, big+step):
        pickle.dump(a, f)

with open('g.dat', "bw") as f:
    for a in range(big, big+step):
        pickle.dump(gmpy2.mpz(a), f)
$ python bench2.py 100; ls -l *.dat
-rw-r--r-- 1 sk sk 1,1K фев 16 14:12 g.dat
-rw-r--r-- 1 sk sk  360 фев 16 14:12 i.dat
$ python bench2.py 1000; ls -l *.dat
-rw-r--r-- 1 sk sk 3,3K фев 16 14:12 g.dat
-rw-r--r-- 1 sk sk 2,6K фев 16 14:12 i.dat
$ python bench2.py 10000000; ls -l *.dat
-rw-r--r-- 1 sk sk 24M фев 16 14:12 g.dat
-rw-r--r-- 1 sk sk 24M фев 16 14:12 i.dat
```

(Tested on CPython 3.7.1 with gmpy2 2.1.0a4.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

to_pickable/from_pickable may be obsoleted or may be more simple #440

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

to_pickable/from_pickable may be obsoleted or may be more simple #440

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions