Skip to content

Conversation

Fiona-Waters
Copy link
Contributor

@Fiona-Waters Fiona-Waters commented May 30, 2025

What this PR does / why we need it:

This PR adds a FeastRagRetriever that inherits from the HuggingFace Transformers RagRetriever class. It allows for integration with feast functionality allowing RAG to be performed more easily using the Feature Store.

The following has been added:

rag_retriever.py
This module implements a custom RAG retriever by combining Feast feature store capabilities with transformer-based models. The implementation provides:

  • A custom RAG retriever (FeastRAGRetriever) that supports three search modes:
    • Text-based search
    • Vector-based search
    • Hybrid search
  • Integration with Hugging Face's transformers library and sentence-transformers
  • Configurable document formatting options

vector_store.py
This module implements a FeastVectorStore class. The FeastVectorStore takes in a FeatureStore, FeatureView and features using these to query the store via the existing retrieve_online_documents_v2 function.

setup.py
Adding entries for RAG dependencies allowing for installation with pip install feast[rag].

init.py
Adding entries for new classes added in rag_retriever.py and vector_store.py

pyproject.toml
The dependencies required for the RAG implementation have been added here.
The relevant dependencies have also been added to the relevant requirements.txt files.

feature_store.py
Changes were made to the def _validate_vector_features function. The updated function now correctly validates the length of each individual vector within a specified DataFrame column, rather than checking the total number of DataFrame columns. This change ensures that every single vector matches its expected dimension, significantly improving data integrity and error reporting. This was improved following addition of unit tests.

Unit tests
Unit tests have been included to cover the added functionality, along with some minor changes in the existing example_feature_repo_1.py file.

examples/rag-retriever
An example has been added here including, feature_repo, README, low level design image and example notebook. This currently includes implementation on Red Hat OpenShift with a remote Milvus instance.

.github/workflows
The change here is a small update suggested by @ntkathole for torch installation and error handling.

Which issue(s) this PR fixes:

Related to #5391 but get_top_docs has not been implemented here as the FeastIndex is a dummy index - the retrieval functionality exists in the FeastRagRetriever via the FeastVector store which uses retrieve_online_documents_v2 .

Misc

@Fiona-Waters Fiona-Waters changed the title [WIP] feat: Add feast rag retriver functionality feat: Add feast rag retriver functionality May 30, 2025
@Fiona-Waters Fiona-Waters force-pushed the ragretriever branch 4 times, most recently from 0abf934 to fc99dd0 Compare May 30, 2025 22:47
@Fiona-Waters Fiona-Waters force-pushed the ragretriever branch 14 times, most recently from 069fe93 to 6873c84 Compare June 5, 2025 12:28
@Fiona-Waters Fiona-Waters force-pushed the ragretriever branch 6 times, most recently from ec5ba98 to 40dfde0 Compare June 5, 2025 19:15
Copy link
Contributor

@jyejare jyejare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Progress, few comments.

@Fiona-Waters Fiona-Waters force-pushed the ragretriever branch 9 times, most recently from b56ec58 to ebbe4e2 Compare June 23, 2025 14:51
Copy link
Member

@ntkathole ntkathole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

Copy link

@astefanutti astefanutti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, awesome work!

Co-authored-by: Esa Fazal <efazal@redhat.com>

Signed-off-by: Fiona Waters <fiwaters6@gmail.com>
Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thank you @Fiona-Waters !! Can we add a link to this in the examples in the documentation? Can be done in a follow up PR.

@franciscojavierarceo franciscojavierarceo merged commit 0173033 into feast-dev:master Jun 24, 2025
17 of 19 checks passed
franciscojavierarceo pushed a commit that referenced this pull request Jun 30, 2025
# [0.50.0](v0.49.0...v0.50.0) (2025-06-30)

### Bug Fixes

* Add asyncio to integration test ([#5418](#5418)) ([6765515](6765515))
* Add clickhouse to OFFLINE_STORE_CLASS_FOR_TYPE map ([#5251](#5251)) ([9ed2ffa](9ed2ffa))
* Add missing conn.commit() in SnowflakeOnlineStore.online_write_batch ([#5432](#5432)) ([a83dd85](a83dd85))
* Add transformers in required dependencies ([8cde460](8cde460))
* Allow custom annotations on Operator installed objects ([#5339](#5339)) ([44c7a76](44c7a76))
* Dask pulling of latest data ([#5229](#5229)) ([571d81f](571d81f))
* **dask:** preserve remote URIs (e.g. s3://) in DaskOfflineStore path resolution ([2561cfc](2561cfc))
* Fix Event loop is closed error on dynamodb test ([#5480](#5480)) ([fe0f671](fe0f671))
* Fix lineage entity filtering ([#5321](#5321)) ([0d05701](0d05701))
* Fix list saved dataset api ([833696c](833696c))
* Fix NumPy - PyArrow array type mapping in Trino offline store ([#5393](#5393)) ([9ba9ded](9ba9ded))
* Fix pandas 2.x compatibility issue of Trino offline store caused by removed Series.iteritems() method ([#5345](#5345)) ([61e3e02](61e3e02))
* Fix polling mechanism for TestApplyAndMaterialize ([#5451](#5451)) ([b512a74](b512a74))
* Fix remote rbac integration tests ([#5473](#5473)) ([10879ec](10879ec))
* Fix Trino offline store SQL in Jinja template ([#5346](#5346)) ([648c53d](648c53d))
* Fixed CurlGeneratorTab github theme type ([#5425](#5425)) ([5f15329](5f15329))
* Increase the Operator Manager memory limits and requests ([#5441](#5441)) ([6c94dbf](6c94dbf))
* Method signature for push_async is out of date ([#5413](#5413)) ([28c3379](28c3379)), closes [#5410](#5410) [#006BB4](https://github.com/feast-dev/feast/issues/006BB4)
* Operator - support securityContext override at Pod level ([#5325](#5325)) ([33ea0f5](33ea0f5))
* Pybuild-deps throws errors w/ latest pip version ([#5311](#5311)) ([f2d6a67](f2d6a67))
* Reopen for integration test about add s3 storage-based registry store in Go feature server ([#5352](#5352)) ([ef75f61](ef75f61))
* resolve Python logger warnings ([#5361](#5361)) ([37d5c19](37d5c19))
* The ignore_paths not taking effect duration feast apply ([#5353](#5353)) ([e4917ca](e4917ca))
* Update generate_answer function to provide correct parameter format to retrieve function ([dc5b2af](dc5b2af))
* Update milvus connect function to work with remote instance ([#5382](#5382)) ([7e5e7d5](7e5e7d5))
* Updating milvus connect function to work with remote instance ([#5401](#5401)) ([b89fadd](b89fadd))
* Upperbound limit for protobuf generation ([#5309](#5309)) ([a114aae](a114aae))

### Features

* Add CLI, SDK, and API documentation page to Feast UI ([#5337](#5337)) ([203e888](203e888))
* Add dark mode toggle to Feast UI ([#5314](#5314)) ([ad02e46](ad02e46))
* Add data labeling tabs to UI ([#5410](#5410)) ([389ceb7](389ceb7)), closes [#006BB4](https://github.com/feast-dev/feast/issues/006BB4)
* Add Decimal to allowed python scalar types ([#5367](#5367)) ([4777c03](4777c03))
* Add feast rag retriver functionality ([#5405](#5405)) ([0173033](0173033))
* Add feature view curl generator ([#5415](#5415)) ([7a5b48f](7a5b48f))
* Add feature view lineage tab and filtering to home page lineage ([#5308](#5308)) ([308255d](308255d))
* Add feature view tags to dynamo tags ([#5291](#5291)) ([3a787ac](3a787ac))
* Add HybridOnlineStore for multi-backend online store routing ([#5423](#5423)) ([ebd67d1](ebd67d1))
* Add max_file_size to Snowflake config ([#5377](#5377)) ([e8cdf5d](e8cdf5d))
* Add MCP (Model Context Protocol) support for Feast feature server ([#5406](#5406)) ([de650de](de650de)), closes [#5398](#5398) [#5382](#5382) [#5389](#5389) [#5401](#5401)
* Add rag project to default dev UI ([#5323](#5323)) ([3b3e1c8](3b3e1c8))
* Add s3 storage-based registry store in Go feature server ([#5336](#5336)) ([abe18df](abe18df))
* Add support for data labeling in UI ([#5409](#5409)) ([d183c4b](d183c4b)), closes [#27](#27)
* Added Lineage APIs to get registry objects relationships ([#5472](#5472)) ([be004ef](be004ef))
* Added rest-apis serving option for registry server ([#5342](#5342)) ([9740fd1](9740fd1))
* Added torch.Tensor as option for online and offline retrieval ([#5381](#5381)) ([0b4ae95](0b4ae95))
* Adding feast delete to CLI ([#5344](#5344)) ([19fe3ac](19fe3ac))
* Adding permissions to UI and refactoring some things ([#5320](#5320)) ([6f1b0cc](6f1b0cc))
* Allow to set registry server rest/grpc mode in operator ([#5364](#5364)) ([99afd6d](99afd6d))
* Allow to use env variable FEAST_FS_YAML_FILE_PATH and FEATURE_REPO_DIR ([#5420](#5420)) ([6a1b33a](6a1b33a))
* Enable materialization for ODFV Transform on Write ([#5459](#5459)) ([3d17892](3d17892))
* Improve search results formatting ([#5326](#5326)) ([18cbd7f](18cbd7f))
* Improvements to Lambda materialization engine ([#5379](#5379)) ([b486f29](b486f29))
* Make batch_source optional in PushSource ([#5440](#5440)) ([#5454](#5454)) ([ae7e20e](ae7e20e))
* Refactor materialization engine ([#5354](#5354)) ([f5c5360](f5c5360))
* Remote Write to Online Store completes client / server architecture ([#5422](#5422)) ([2368f42](2368f42))
* Serialization version 2 and below removed ([#5435](#5435)) ([9e50e18](9e50e18))
* SQLite online retrieval. Add timezone info into timestamp. ([#5386](#5386)) ([6b05153](6b05153))
* Support dual-mode REST and gRPC for Feast Registry Server ([#5396](#5396)) ([fd1f448](fd1f448))
* Support DynamoDB as online store in Go feature server ([#5464](#5464)) ([40d25c6](40d25c6))
* Update Spark Compute read source node to be able to use other data sources ([#5445](#5445)) ([a93d300](a93d300))

### Reverts

* Feat: Add CLI, SDK, and API documentation page to Feast UI" ([#5341](#5341)) ([b492f14](b492f14)), closes [#5337](#5337)
* Revert "feat: Add s3 storage-based registry store in Go feature server" ([#5351](#5351)) ([d5d6766](d5d6766)), closes [#5336](#5336)
* Revert "fix: Update milvus connect function to work with remote instance" ([#5398](#5398)) ([434dd92](434dd92)), closes [#5382](#5382)
franciscojavierarceo pushed a commit that referenced this pull request Jul 1, 2025
# [0.50.0](v0.49.0...v0.50.0) (2025-07-01)

### Bug Fixes

* Add asyncio to integration test ([#5418](#5418)) ([6765515](6765515))
* Add clickhouse to OFFLINE_STORE_CLASS_FOR_TYPE map ([#5251](#5251)) ([9ed2ffa](9ed2ffa))
* Add missing conn.commit() in SnowflakeOnlineStore.online_write_batch ([#5432](#5432)) ([a83dd85](a83dd85))
* Add transformers in required dependencies ([8cde460](8cde460))
* Allow custom annotations on Operator installed objects ([#5339](#5339)) ([44c7a76](44c7a76))
* Dask pulling of latest data ([#5229](#5229)) ([571d81f](571d81f))
* **dask:** preserve remote URIs (e.g. s3://) in DaskOfflineStore path resolution ([2561cfc](2561cfc))
* Fix Event loop is closed error on dynamodb test ([#5480](#5480)) ([fe0f671](fe0f671))
* Fix lineage entity filtering ([#5321](#5321)) ([0d05701](0d05701))
* Fix list saved dataset api ([833696c](833696c))
* Fix NumPy - PyArrow array type mapping in Trino offline store ([#5393](#5393)) ([9ba9ded](9ba9ded))
* Fix pandas 2.x compatibility issue of Trino offline store caused by removed Series.iteritems() method ([#5345](#5345)) ([61e3e02](61e3e02))
* Fix polling mechanism for TestApplyAndMaterialize ([#5451](#5451)) ([b512a74](b512a74))
* Fix remote rbac integration tests ([#5473](#5473)) ([10879ec](10879ec))
* Fix Trino offline store SQL in Jinja template ([#5346](#5346)) ([648c53d](648c53d))
* Fixed CurlGeneratorTab github theme type ([#5425](#5425)) ([5f15329](5f15329))
* Increase the Operator Manager memory limits and requests ([#5441](#5441)) ([6c94dbf](6c94dbf))
* Method signature for push_async is out of date ([#5413](#5413)) ([28c3379](28c3379)), closes [#5410](#5410) [#006BB4](https://github.com/feast-dev/feast/issues/006BB4)
* Operator - support securityContext override at Pod level ([#5325](#5325)) ([33ea0f5](33ea0f5))
* Pybuild-deps throws errors w/ latest pip version ([#5311](#5311)) ([f2d6a67](f2d6a67))
* Reopen for integration test about add s3 storage-based registry store in Go feature server ([#5352](#5352)) ([ef75f61](ef75f61))
* resolve Python logger warnings ([#5361](#5361)) ([37d5c19](37d5c19))
* The ignore_paths not taking effect duration feast apply ([#5353](#5353)) ([e4917ca](e4917ca))
* Update generate_answer function to provide correct parameter format to retrieve function ([dc5b2af](dc5b2af))
* Update milvus connect function to work with remote instance ([#5382](#5382)) ([7e5e7d5](7e5e7d5))
* Updating milvus connect function to work with remote instance ([#5401](#5401)) ([b89fadd](b89fadd))
* Upperbound limit for protobuf generation ([#5309](#5309)) ([a114aae](a114aae))

### Features

* Add CLI, SDK, and API documentation page to Feast UI ([#5337](#5337)) ([203e888](203e888))
* Add dark mode toggle to Feast UI ([#5314](#5314)) ([ad02e46](ad02e46))
* Add data labeling tabs to UI ([#5410](#5410)) ([389ceb7](389ceb7)), closes [#006BB4](https://github.com/feast-dev/feast/issues/006BB4)
* Add Decimal to allowed python scalar types ([#5367](#5367)) ([4777c03](4777c03))
* Add feast rag retriver functionality ([#5405](#5405)) ([0173033](0173033))
* Add feature view curl generator ([#5415](#5415)) ([7a5b48f](7a5b48f))
* Add feature view lineage tab and filtering to home page lineage ([#5308](#5308)) ([308255d](308255d))
* Add feature view tags to dynamo tags ([#5291](#5291)) ([3a787ac](3a787ac))
* Add HybridOnlineStore for multi-backend online store routing ([#5423](#5423)) ([ebd67d1](ebd67d1))
* Add max_file_size to Snowflake config ([#5377](#5377)) ([e8cdf5d](e8cdf5d))
* Add MCP (Model Context Protocol) support for Feast feature server ([#5406](#5406)) ([de650de](de650de)), closes [#5398](#5398) [#5382](#5382) [#5389](#5389) [#5401](#5401)
* Add rag project to default dev UI ([#5323](#5323)) ([3b3e1c8](3b3e1c8))
* Add s3 storage-based registry store in Go feature server ([#5336](#5336)) ([abe18df](abe18df))
* Add support for data labeling in UI ([#5409](#5409)) ([d183c4b](d183c4b)), closes [#27](#27)
* Added Lineage APIs to get registry objects relationships ([#5472](#5472)) ([be004ef](be004ef))
* Added rest-apis serving option for registry server ([#5342](#5342)) ([9740fd1](9740fd1))
* Added torch.Tensor as option for online and offline retrieval ([#5381](#5381)) ([0b4ae95](0b4ae95))
* Adding feast delete to CLI ([#5344](#5344)) ([19fe3ac](19fe3ac))
* Adding permissions to UI and refactoring some things ([#5320](#5320)) ([6f1b0cc](6f1b0cc))
* Allow to set registry server rest/grpc mode in operator ([#5364](#5364)) ([99afd6d](99afd6d))
* Allow to use env variable FEAST_FS_YAML_FILE_PATH and FEATURE_REPO_DIR ([#5420](#5420)) ([6a1b33a](6a1b33a))
* Enable materialization for ODFV Transform on Write ([#5459](#5459)) ([3d17892](3d17892))
* Improve search results formatting ([#5326](#5326)) ([18cbd7f](18cbd7f))
* Improvements to Lambda materialization engine ([#5379](#5379)) ([b486f29](b486f29))
* Make batch_source optional in PushSource ([#5440](#5440)) ([#5454](#5454)) ([ae7e20e](ae7e20e))
* Refactor materialization engine ([#5354](#5354)) ([f5c5360](f5c5360))
* Remote Write to Online Store completes client / server architecture ([#5422](#5422)) ([2368f42](2368f42))
* Serialization version 2 and below removed ([#5435](#5435)) ([9e50e18](9e50e18))
* SQLite online retrieval. Add timezone info into timestamp. ([#5386](#5386)) ([6b05153](6b05153))
* Support dual-mode REST and gRPC for Feast Registry Server ([#5396](#5396)) ([fd1f448](fd1f448))
* Support DynamoDB as online store in Go feature server ([#5464](#5464)) ([40d25c6](40d25c6))
* Update Spark Compute read source node to be able to use other data sources ([#5445](#5445)) ([a93d300](a93d300))

### Reverts

* Chore Release "chore(release): release 0.50.0" ([#5483](#5483)) ([0eef391](0eef391))
* Feat: Add CLI, SDK, and API documentation page to Feast UI" ([#5341](#5341)) ([b492f14](b492f14)), closes [#5337](#5337)
* Revert "feat: Add s3 storage-based registry store in Go feature server" ([#5351](#5351)) ([d5d6766](d5d6766)), closes [#5336](#5336)
* Revert "fix: Update milvus connect function to work with remote instance" ([#5398](#5398)) ([434dd92](434dd92)), closes [#5382](#5382)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants