-
Notifications
You must be signed in to change notification settings - Fork 464
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
When using the GermanDPR dataset with a CrossEncoder, the dataset is returning a dict instead of a str. This results in an error because the CrossEncoder expects text data as a string.
The following error is raised when processing the dataset:
/sentence_transformers/cross_encoder/CrossEncoder.py", line 170, in smart_batching_collate_text_only
texts[idx].append(text.strip())
AttributeError: 'dict' object has no attribute 'strip'
I see two potential ways to address this issue:
- Modify the GermanDPR Dataset:
Combine title and formatted_content fields from the dict into a single str before passing it to the CrossEncoder.
Example:f"{title} {formatted_content}"
- Update CrossEncoder Logic:
Add a check in CrossEncoder to handle cases where a dict is passed instead of a str.
If a dict is detected, convert it to a str within the CrossEncoder.
result[id_value] = {"title": title, "text": formatted_content} |
--
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working