Skip to content

Feature Request: Embeddings index checkpointing #695

@GownerCode

Description

@GownerCode

Feature Description

I suggest implementing an autosave feature:

embeddings.index(data, autosave={"interval": 3600, "save_path": "/home/user/index"})

Something like this should save the index every interval seconds to save_path.

Reason

When one processes a large dataset, indexing can take a long time. The naive approach:

embeddings.index(my_data)
embeddings.save(my_save_path)

embeddings.index(...) takes a long time. When working with a database or other network dependent data retrieval, something may go wrong during the index call which means the save call is never reached, all progress is lost.

Value of Feature

It would allow for the ability to continue an index that has failed for reasons other than bad data.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions