Skip to content

[v2] convert category from length descriptor to modality in task metadata #1767

@KennethEnevoldsen

Description

@KennethEnevoldsen

Not sure if this is a good idea. Currently it is already somewhat vaguely defined.

I believe the original intention is to tell us something about the length (s2p: sentence to paragraph), but we know have the descriptive statistics which is a much better source.

However in MIEB it is used as "t2i", text to image.

@Muennighoff would love to know what you think:

here is a sampel from the desc. statistics:

...
        "average_document_length": 20.28592186371801,
        "max_document_length": 214210,
        "unique_documents": 1005474,
        "min_query_length": 2,
        "average_query_length": 38.259317745096176,
...

@isaac-chung you have also been involved greatly in both parts.

(an alternative is to convert the annotation in mieb into "s2i" meaning sentence to image)

Metadata

Metadata

Assignees

No one assigned

    Labels

    v2Issues and PRs related to `v2` branch

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions