Handling truncation for the Similarity pipeline

I have a reranker (cross encoder) which relies on an absolute positional embedding, limited to 8192 tokens.
In a situation where the length of prompt + document > 8192 tokens, it stops.
Tried using kwargs but didnt find the one passed to the tokenizer 

https://github.com/neuml/txtai/blob/437ee1c15b5f870c4abaac8176b73a573cf3994e/src/python/txtai/pipeline/text/crossencoder.py#L40
 
```python
tokenizer = AutoTokenizer.from_pretrained(reranker_path)
model = AutoModelForSequenceClassification.from_pretrained(reranker_path, torch_dtype=torch.float16)
similarity_scorer = Similarity(
	(model, tokenizer), 
	crossencode=True, 
)
```

How to truncate on the right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handling truncation for the Similarity pipeline #882

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handling truncation for the Similarity pipeline #882

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions