Skip to content

Handle empty token list with word vectorization #861

@davidmezzetti

Description

@davidmezzetti

There is a bug that has long been in the word vectors module that has been uncovered by the migration to staticvectors.

When passing a list of strings for vectorization, the current code tokenizes each row to a list of tokens. It doesn't handle the case when no tokens are returned.

This change will pass the original string as a token list if no tokens are parsed.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions