`No Query Left Behind`: Query Refinement via Backtranslation

Web users often struggle to express their information needs clearly in short, vague queries, making it hard for search engines to find relevant results. Query refinement, which aims to improve search relevance by adjusting original queries, is crucial in addressing this challenge. However, current evaluation methods for query refinement models may not accurately reflect real-world usage patterns. We propose a novel approach using natural language backtranslation to create benchmark datasets for evaluating query refinement models. Backtranslation involves translating a query from one language to another and then translating it back, ensuring that the meaning remains consistent. We believe that backtranslation can:

Identify missing terms in a query that are assumed to be understood due to their common usage in the original language.
Include relevant synonyms from the target language to provide additional context.
Clarify the meaning of ambiguous terms or phrases.

We conducted extensive experiments using widely recognized TREC query sets and multiple languages. Our results, evaluated using various metrics, demonstrate the effectiveness of backtranslation in creating high-quality benchmark datasets for evaluating query refinement methods.

1. Setup
- Lucene Indexes
2. Quickstart
3. File Structure
4. Acknowledgment
5. License
6. Citation

1. Setup

You need to have Python=3.8 and install pyserini package (needs Java), among others listed in requirements.txt. You may also need to install anserini. Only for indexing purposes and RelevanceFeedback refiner.

Important

Anserini is only compatible with Java version 11. Using versions older or newer than this will result in an error.

We also suggest you clone our repo with the --recurse-submodules (alternatively, use the git submodule update --init inside the cloned repository) to get the trec_eval metric evaluation tool:

By pip, clone the codebase and install the required packages:

pip install -r requirements.txt

By conda:

conda env create -f environment.yml
conda activate repair

Note: When installing Java, remember to set JAVA_HOME in Windows's environment variables.

For trec_eval:

cd src/trec_eval.9.0.4
make 
cd ..

Lucene Indexes

To perform fast IR tasks, we need to build the sparse indexes of document corpora or use the prebuilt-indexes. The path to the index need to be set in ./src/param.py.

2. Quickstart

For using query refinement make sure to add the command to the pipeline in the ./src/param.py.

As seen in the above workflow, RePair has three pipelined steps:

Refining Quereis: [query_refinement]

Performance Evaluation: [search, eval]

Dataset Curation: [agg, box]

To run RePair pipeline, we need to set the required parameters of each step in ./src/param.py such as the ranker used in the search step. Then, the pipeline can be run by its driver at ./src/main.py:

python -u main.py -data ../data/raw/toy.msmarco.passage -domain msmarco.passage

python -u main.py -data ../data/raw/toy.aol-ia -domain aol-ia

python -u main.py -data ../data/raw/toy.msmarco.passage ../data/raw/toy.aol-ia -domain msmarco.passage aol-ia

`['query_refinement']`

Refiners

The objective of query refinement is to produce a set of potential candidate queries that can function as enhanced and improved versions. This involves systematically applying various unsupervised query refinement techniques to each query within the input dataset.

Class Diagram for Query Refiners in src/refinement/. [zoom in!].

The expanders are initialized by the Expander Factory in src/refinement/refiner_factory.py

Here is the list of refiners:

Refiner	Category	Analyze type
backtranslation	Machine_Translation	Global
tagmee	Wikipedia	Global
stemmers	Stemming_Analysis	Global
semantic	Semantic_Analysis	Global
sensedisambiguation	Semantic_Analysis	Global
embedding-based	Semantic_Analysis	Global
anchor	Anchor_Text	Global
wiki	Wikipedia	Global
relevance-feedback	Top_Documents	Local
bertqe	Top_Documents	Local

Backtranslation

Back translation, also known as reverse translation or dual translation, involves translating content, whether it is a query or paragraph, from one language to another and retranslating it to the original language. This method provides several options for the owner to make a decision that makes the most sense based on the task at hand. For additional details, please refer to this document.

Example:

q	map q	language	translated q	backtranslated q	map q'
Italian nobel prize winners	0.2282	farsi	برنده های جایزه نوبل ایتالیایی	Italian Nobel laureates	0.5665
banana paper making	0.1111	korean	바나나 종이 제조	Manufacture of banana paper	1
figs	0.0419	tamil	அத்திமரங்கள்	The fig trees	0.0709

`['similarity']`

To evaluate the quality of the refined queries, metrics such as bleu, rouge, and semsim are employed. The bleu score measures the similarity between the backtranslated and original query by analyzing n-grams, while the rouge score considers the overlap of n-grams to capture essential content. Due to their simplicity and effectiveness, these metrics are widely utilized in machine translation tasks. Despite their usefulness, both scores may not accurately capture the overall meaning or fluency of the translated text due to their heavy reliance on n-grams. To address topic drift and evaluate the similarity between the original and refined queries, we additionally employ declutr for query embeddings, computing cosine similarity. Declutr, a self-learning technique requiring no labeled data, minimizes the performance gap between unsupervised and supervised pretraining for universal sentence encoders during the extension of transformer-based language model training. The semsim metric, relying on cosine similarity of embeddings, proves highly effective in capturing the subtle semantic nuances of language, establishing itself as a dependable measure of the quality of backtranslated queries.

The below images demonstrate the average token count for the original queries in English and their backtranslated versions across various languages, along with the average pairwise semantic similarities measured using 'rouge' and 'declutr'. It's evident that all languages were able to introduce new terms into the backtranslated queries while maintaining semantic coherence.

Example: These samples are taken from an ANTIQUE dataset that has been refined using a backtranslation refiner with the German language.

id	original	refined	rouge1	rouge2	rougeL	rougeLsum	bleu	precisions	brevity_penalty	length_ratio	translation_length	reference_length	semsim
1290612	why do anxiety and depression seem to coexist?	Why do fear and depression seem to be linked	0.705882	0.533333	0.705882	0.705882353	0.315598	[0.5555555555555556, 0.375, 0.2857142857142857, 0.16666666666666666]	1	1	9	9	0.8554905
4473331	How can I keep my rabit indoors?	How can I keep my rabbit in the house	0.625	0.571429	0.625	0.625	0.446324	[0.5555555555555556, 0.5, 0.42857142857142855, 0.3333333333333333]	1	1.125	9	8	0.7701595
1509982	How is the Chemistry is a basic of Science?	How is chemistry a principle of science	0.75	0.285714	0.75	0.75	0	[0.5714285714285714, 0.16666666666666666, 0.0, 0.0]	0.651439058	0.7	7	10	0.7796929

`['search']`

We search the relevant documents for both the original query and each of the potential refined queries. We need to set an information retrieval method, called ranker, that retrieves relevant documents and ranks them based on relevance scores. We integrate pyserini, which provides efficient implementations of sparse and dense rankers, including bm25 and qld (query likelihood with Dirichlet smoothing).

`['eval']`

The search results of each potential refined query are evaluated based on how they improve the performance for an evaluation metric like map or mrr.

`['agg']`

Finally, we keep those potentially refined queries whose performance (metric score) have been better or equal compared to the original query.

We keep four of these datasets as the outcome of the RePair pipeline:

./output/{input query set}/{ranker}.{metric}.agg/{ranker}.{metric}.agg.{selected refiner}.all.tsv

./output/{input query set}/{ranker}.{metric}.agg/{ranker}.{metric}.agg.{selected refiner}.gold.tsv

./output/{input query set}/{ranker}.{metric}.agg/{ranker}.{metric}.agg.{selected refiner}.platinum.tsv

./output/{input query set}/{ranker}.{metric}.agg/{ranker}.{metric}.agg.{selected refiner}.negative.tsv

'agg': {'all':               'all predicted refined queries along with the performance metric values',
        'gold':              'refined_q_metric >= original_q_metric and refined_q_metric > 0',
        'platinum':          'refined_q_metric > original_q_metric',
        'negative':          'refined_q_metric < original_q_metric}

The 'selected refiner' option refers to the categories we experiment on and create the datasets:

nllb: Only backtranslation with nllb
-bt: Other refiners than backtranslartion
+bt: All the refiners except bing translator
allref: All the refiners

3. File Structure

The final structure of the output will look like the below:

├── output
│   └── dataset_name              [such as robust04]
│   │   └── refined_queries_files [such as refiner.bt_bing_persian] 
│   │   └── ranker.metric         [such as bm25.map]
│   │   │   └── ranker_files      [such as refiner.bt_bing_persian.bm25]
│   │   │   └── metric_files      [such as refiner.bt_bing_persian.bm25.map]
│   │   │   └── agg               [such as refiner.bt_bing_persian.bm25]
│   │   │   │    └── agg_files    [such as bm25.map.agg.+bt.all.tsv]

The results are available in the ./output file. You can also access all the results through this link.

4. Acknowledgment

We benefit from trec_eval, pyserini, ir-dataset, and other libraries. We would like to thank the authors of these libraries and helpful resources.

5. License

6. Citation

@inproceedings{DBLP:conf/cikm/RajaeiTF24,
  author       = {Delaram Rajaei and
                  Zahra Taheri and
                  Hossein Fani},
  title        = {No Query Left Behind: Query Refinement via Backtranslation},
  booktitle    = {Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM '24), October 21--25, 2024, Boise, ID, USA},
  series       = {},
  volume       = {},
  pages        = {},
  publisher    = {Springer},
  year         = {2024},
  url          = {https://doi.org/10.1145/3627673.3679729},
  doi          = {10.1145/3627673.3679729},
  biburl       = {https://dblp.org/rec/conf/cikm/RajaeiTF24.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Name		Name	Last commit message	Last commit date
Latest commit History 399 Commits
.github/workflows		.github/workflows
data		data
misc		misc
output		output
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
RUNT5.md		RUNT5.md
environment.yml		environment.yml
indexing.py		indexing.py
requirements.txt		requirements.txt
similarity.png		similarity.png
testing_reqs.txt		testing_reqs.txt
translated_query.py		translated_query.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

`No Query Left Behind`: Query Refinement via Backtranslation

1. Setup

Lucene Indexes

2. Quickstart

`['query_refinement']`

Refiners

Backtranslation

`['similarity']`

`['search']`

`['eval']`

`['agg']`

3. File Structure

4. Acknowledgment

5. License

6. Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

fani-lab/RePair

Folders and files

Latest commit

History

Repository files navigation

No Query Left Behind: Query Refinement via Backtranslation

1. Setup

Lucene Indexes

2. Quickstart

Refiners

Backtranslation

3. File Structure

4. Acknowledgment

5. License

6. Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages

`No Query Left Behind`: Query Refinement via Backtranslation