Writing to hdf5 file slows down enourmously for many long keys in the same group

To assist reproducing bugs, please include the following:
 * Operating System: Ubuntu 16.04  4.4.0-119-generic
 * Python version:   3.5.2
 * Where Python was acquired: system
 * h5py version: 2.8.0
 * HDF5 version 1.10.2
 * The full traceback/stack trace shown: none

We are converting data which has keys mapped to vectors into hdf5 format. After some time the writing slows down.

After several experiments we found that this happens when many datasets are written to a single group, and when these datasets have a long name (the longer the name, the earlier the problem occurs).
In the beginning the writing goes rather fast (+- 20MB/s), but after some point, the writing speed slows down to about 10 KB/s. After that point, the speed does not seem to go up again.

The size of the data does not affect. It seems to be some sort of limit on the amount and length of the keys.

The issue is reproducible with the code below:

```
import numpy as np
import h5py
import hashlib

def writeManyDatasets():
	file = h5py.File("myfile.h5", 'w')
	for i in range(0, 500000):
		data = np.asarray([1.0], dtype='float64')
		encodedName = hashlib.sha1(str(i).encode('utf-8')).hexdigest()
		encodedName = encodedName * 6
		#print (encodedName)
		dataset = file.create_dataset(encodedName, data=data)
		if i % 10000 == 0:
			print("Done with " + str(i))			
	file.close ()
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Writing to hdf5 file slows down enourmously for many long keys in the same group #1055

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Writing to hdf5 file slows down enourmously for many long keys in the same group #1055

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions