-
Notifications
You must be signed in to change notification settings - Fork 466
Closed
Labels
v2Issues and PRs related to `v2` branchIssues and PRs related to `v2` branch
Milestone
Description
Not sure if this is a good idea. Currently it is already somewhat vaguely defined.
I believe the original intention is to tell us something about the length (s2p: sentence to paragraph), but we know have the descriptive statistics which is a much better source.
However in MIEB it is used as "t2i", text to image.
@Muennighoff would love to know what you think:
here is a sampel from the desc. statistics:
...
"average_document_length": 20.28592186371801,
"max_document_length": 214210,
"unique_documents": 1005474,
"min_query_length": 2,
"average_query_length": 38.259317745096176,
...
@isaac-chung you have also been involved greatly in both parts.
(an alternative is to convert the annotation in mieb into "s2i" meaning sentence to image)
isaac-chung
Metadata
Metadata
Assignees
Labels
v2Issues and PRs related to `v2` branchIssues and PRs related to `v2` branch