-
Notifications
You must be signed in to change notification settings - Fork 543
Closed
Closed
Copy link
Description
To assist reproducing bugs, please include the following:
- Operating System: Ubuntu 16.04 4.4.0-119-generic
- Python version: 3.5.2
- Where Python was acquired: system
- h5py version: 2.8.0
- HDF5 version 1.10.2
- The full traceback/stack trace shown: none
We are converting data which has keys mapped to vectors into hdf5 format. After some time the writing slows down.
After several experiments we found that this happens when many datasets are written to a single group, and when these datasets have a long name (the longer the name, the earlier the problem occurs).
In the beginning the writing goes rather fast (+- 20MB/s), but after some point, the writing speed slows down to about 10 KB/s. After that point, the speed does not seem to go up again.
The size of the data does not affect. It seems to be some sort of limit on the amount and length of the keys.
The issue is reproducible with the code below:
import numpy as np
import h5py
import hashlib
def writeManyDatasets():
file = h5py.File("myfile.h5", 'w')
for i in range(0, 500000):
data = np.asarray([1.0], dtype='float64')
encodedName = hashlib.sha1(str(i).encode('utf-8')).hexdigest()
encodedName = encodedName * 6
#print (encodedName)
dataset = file.create_dataset(encodedName, data=data)
if i % 10000 == 0:
print("Done with " + str(i))
file.close ()
tkorach and takasao
Metadata
Metadata
Assignees
Labels
No labels