Skip to content

Custom features not compatible with special encoding/decoding logic #7220

@alex-hh

Description

@alex-hh

Describe the bug

It is possible to register custom features using datasets.features.features.register_feature (#6727)

However such features are not compatible with Features.encode_example/decode_example if they require special encoding / decoding logic because encode_nested_example / decode_nested_example checks whether the feature is in a fixed list of encodable types:

elif isinstance(schema, (Audio, Image, ClassLabel, TranslationVariableLanguages, Value, _ArrayXD)):

This prevents the extensibility of features to complex cases

Steps to reproduce the bug

class ListOfStrs:
    def encode_example(self, value):
        if isinstance(value, str):
            return [str]
        else:
            return value
feats = Features(strlist=ListOfStrs())
assert feats.encode_example({"strlist": "a"})["strlist"] = feats["strlist"].encode_example("a")}

Expected behavior

Registered feature types should be encoded based on some property of the feature (e.g. requires_encoding)?

Environment info

3.0.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions