Computer Science w/ AI Specialization @ University of Waterloo
- Strong interest in AI/ML systems, with hands-on experience in Deep Learning, Natural Language Processing, Information Retrieval, Computer Vision, and MLOps
- Passionate about developing and using open-source tools
- Average nvim enjoyer
RankLLM (Python toolkit for reproducible information retrieval research using rerankers):
- One of the top contributors to this toolkit
- Designed and implemented customizable prompt template feature, replacing hardcoded prompts and response analysis with dynamic configurations to improve extensibility and maintainability while ensuring backward compatibility
- Designed and implemented an optimized multi-tier caching system for first-stage retrieved results retrieval, combining local file caching, HuggingFace Hub fallback, and on-demand Pyserini retrieval to minimize redundant computations
- Developed other features such as few-shot examples injection, VLLM integration for multi-GPU support, and more
- Helped create unittests and perform regression tests to update regression scores
- Updated RankLLM implementation and usage in other popular repos such as LangChain, rerankers, and LlamaIndex
Pyserini (Python toolkit for reproducible information retrieval research with sparse and dense representations):
- Integrated the M-BEIR dataset and UniIR models into the pyserini pipeline for multimodal retrieval
- Added feature to perform sparse vector encoding with SPLADE models to the pipeline
- Created documentation for various regression tests, as well as computing their scores
[UniIR-for-Pyserini(https://github.com/clides/UniIR-for-Pyserini) (Fork of the original UniIR repo for easy Pyserini integrations):
- Created and released PyPI package for uniir-for-pyserini, which is a fork of the original repo but modified for easy Pyserini integration
Anserini (Lucene toolkit for reproducible information retrieval research)
- Created documentation for various regression tests, as well as computing their scores
- Built indexes and uploaded them to HuggingFace datasets for easy retrieval
- Email: daniel168.guo@gmail.com
- Resume: here
- LinkedIn: linkedin/daniel-guo
- Open to: ML/SWE internship and coop opportunities, research collaborations, and open-source contributions