Skip to content
View clides's full-sized avatar
  • University of Waterloo
  • Universe Lania Kea Supercluster Virgo Cluster Local Group Milky Way Orion Arm Gould Belt Local Bubble Ben Interstellar Cloud Olnit Cloud Solar System Third Planet Earth

Highlights

  • Pro

Organizations

@castorini

Block or report clides

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
clides/README.md

Daniel Guo πŸ‘‹

Computer Science w/ AI Specialization @ University of Waterloo

πŸ” About Me

  • Strong interest in AI/ML systems, with hands-on experience in Deep Learning, Natural Language Processing, Information Retrieval, Computer Vision, and MLOps
  • Passionate about developing and using open-source tools
  • Average nvim enjoyer

🌟 Open Source Contributions

RankLLM (Python toolkit for reproducible information retrieval research using rerankers):

  • One of the top contributors to this toolkit
  • Designed and implemented customizable prompt template feature, replacing hardcoded prompts and response analysis with dynamic configurations to improve extensibility and maintainability while ensuring backward compatibility
  • Designed and implemented an optimized multi-tier caching system for first-stage retrieved results retrieval, combining local file caching, HuggingFace Hub fallback, and on-demand Pyserini retrieval to minimize redundant computations
  • Developed other features such as few-shot examples injection, VLLM integration for multi-GPU support, and more
  • Helped create unittests and perform regression tests to update regression scores
  • Updated RankLLM implementation and usage in other popular repos such as LangChain, rerankers, and LlamaIndex

Pyserini (Python toolkit for reproducible information retrieval research with sparse and dense representations):

  • Integrated the M-BEIR dataset and UniIR models into the pyserini pipeline for multimodal retrieval
  • Added feature to perform sparse vector encoding with SPLADE models to the pipeline
  • Created documentation for various regression tests, as well as computing their scores

[UniIR-for-Pyserini(https://github.com/clides/UniIR-for-Pyserini) (Fork of the original UniIR repo for easy Pyserini integrations):

  • Created and released PyPI package for uniir-for-pyserini, which is a fork of the original repo but modified for easy Pyserini integration

Anserini (Lucene toolkit for reproducible information retrieval research)

  • Created documentation for various regression tests, as well as computing their scores
  • Built indexes and uploaded them to HuggingFace datasets for easy retrieval

πŸ“¬ Let's Connect

Pinned Loading

  1. Projects Projects Public

    Some side projects I made while studying ML

    Jupyter Notebook 1

  2. castorini/rank_llm castorini/rank_llm Public

    RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.

    Python 519 72

  3. castorini/pyserini castorini/pyserini Public

    Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

    Python 1.9k 434