Skip to content

0.8 filespec updates #555

@ivirshup

Description

@ivirshup

0.8 filespec updates

I intend to expand on this a bit

Specs

Each element to be written will now be written with a "spec". That is, attributes which tell us what this element is supposed to be. This is already the case for a number of elements, but not universal. The required attributes for a spec are encoding-type and encoding-version.

What does this allow

This makes it possible to read elements regardless of where they are. For example, these have enabled allowing dataframes to be anywhere in the object (e.g. uns) since we can identify from the hdf5 group how we are suppoed to read the group into memory.

This also makes a path for anndata extensions. That is, we can allow third parties to register their own specs and methods so they can read and write whatever types they want. Some basic examples of this can be found here.

Why encoding-version

Sometime's we make mistakes, or write data in such a way that it doesn't allow a new operation in the future. This gives a controlled way to iterate on what we can do with data on disk and specify whether operation are allowed or not on older formats of data.

The registry

To map between methods and elements there is a registry. Technically two registries, one for writing, and one for reading. The write_registry recognized objects by their type and dispatches to the appropriate writing method. The read registry reads the IOSpec of an object and finds the right reading method.

Questions

  • Is it backend specific? E.g. different registries for h5ad and zarr?
  • Is is a problem that we aren't using subtyping? Currently seems like this actually solves some problems
  • How do we handle the more dynamic types? E.g. lists?

Backwards compat

I think it's time to start throwing some warnings for old files. Anything read without specs in the attributes will start throwing warnings telling people to update their files.

Future directions

  • Partial IO
  • Modifications

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions