Skip to content

measurements/word_length naming #266

@meg-huggingface

Description

@meg-huggingface

measurements/word_length appears to return the average number of words in the input string(s), rather than the (average) word length.
Word length measurements could be a helpful measurement to add, but this measurement doesn't so is a bit misnamed from what I expected.

Some useful, distinct measurements in this family would include:

  • sentence_length: The average number of characters in the input and/or the average number of words in a sentence
  • word_length: The average length of words in the input
  • word_count: The number of words in the input

I propose:

  • Renaming this one (perhaps to sentence_length or string_length), and considering also returning the number of characters (without tokenizing) as an additional option.
  • Or else incorporating into the already-existing word_count, which would mean simply averaging what's already calculated there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions