feat: add support for named vectorizers to ai.vectorizer_errors #740

smoya · 2025-05-14T16:18:11Z

This is an improved version of #683, which was reverted.
This PR considers retrocompatibility with previous versions that have ai.vectorizer_errors as table.

smoya · 2025-05-14T16:25:32Z

projects/pgai/pgai/vectorizer/features/features.py

Those are the changes introduced compared to #683

smoya · 2025-05-14T16:25:45Z

projects/pgai/pgai/vectorizer/vectorizer.py

Those are the changes introduced compared to #683

smoya · 2025-05-14T16:25:50Z

projects/pgai/tests/vectorizer/cassettes/test_errors_table_compatibility.yaml

Those are the changes introduced compared to #683

smoya · 2025-05-14T16:26:10Z

projects/pgai/tests/vectorizer/cli/test_compatibility.py

Those are the changes introduced compared to #683

This test was created in #734

smoya · 2025-05-14T16:28:14Z

projects/pgai/pgai/vectorizer/vectorizer.py

@@ -212,6 +212,18 @@ def migrate_config_to_new_version(cls, data: Any) -> Any:
        logger.warning("Unable to migrate configuration: raw data type is unknown")
        return data  # type: ignore[reportUnknownVariableType]

+    def ensure_features(self, features: Features):


My idea is to have this method in the Vectorizer rather than adding the logic to the Worker directly, as I believe it is really responsibility of the first.

However, I could revert this and move the logic to the worker here if you believe it is not clear/explicit enough

Askir

lgtm

Askir · 2025-05-15T08:57:47Z

projects/pgai/pgai/vectorizer/features/features.py

+        # Newer versions of pgai lib have the ai.vectorizer_errors view.
+        # The table has been renamed to ai._vectorizer_errors
+        query = """
+        SELECT table_name
+        FROM information_schema.views
+        WHERE table_schema = 'ai' AND table_name = 'vectorizer_errors';
+        """
+        cur.execute(query)
+        has_vectorizer_errors_view = cur.fetchone() is not None


I would've probably checked if the _vectorizer_errors table exists and not if vectorizer_errors is a view. But I guess this works.

Any benefit from that approach?

It's just one step closer to what the vectorizer actually needs.
It doesn't really care about the view it only needs to know which table it should insert the errors into. But effectively your are checking this since the view and _ table are created in the same migration.

Askir · 2025-05-15T09:03:04Z

projects/pgai/pgai/vectorizer/vectorizer.py

+        """
+        Ensure the vectorizer is compatible with all possible features.
+        This modifies the vectorizer to make it compatible with the max
+        amount of features as possible.
+        """
+        if (
+            self.errors_table == DEFAULT_VECTORIZER_ERRORS_TABLE
+            and not features.has_vectorizer_errors_view
+        ):
+            self.errors_table = "vectorizer_errors"


I actually don't really know why we configure this on the vectorizer in the first place and not on the executor. The vectorizer doesn't really use the errors table in any way its only useful for the processing. So that could just have a switch flip on insert.

But I guess this works.

Yeah, don't want to early optimize. I moved back that logic to the worker.

feat: add support for named vectorizers to ai.vectorizer_errors

e0e4c6f

smoya requested a review from a team as a code owner May 14, 2025 16:18

smoya temporarily deployed to internal-contributors May 14, 2025 16:18 — with GitHub Actions Inactive

smoya commented May 14, 2025

View reviewed changes

projects/pgai/pgai/vectorizer/features/features.py

Copy link

Contributor Author

smoya May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are the changes introduced compared to #683

smoya commented May 14, 2025

View reviewed changes

projects/pgai/pgai/vectorizer/vectorizer.py

Copy link

Contributor Author

smoya May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are the changes introduced compared to #683

smoya commented May 14, 2025

View reviewed changes

Askir approved these changes May 15, 2025

View reviewed changes

feat: move features logic back to the worker

026c76a

smoya temporarily deployed to internal-contributors May 15, 2025 10:03 — with GitHub Actions Inactive

smoya requested a review from Askir May 15, 2025 10:05

smoya mentioned this pull request May 15, 2025

fix: make vectorizer_errors retrocompatible #736

Closed

smoya merged commit c1d13f4 into main May 16, 2025
14 checks passed

smoya deleted the smoya/ai-675-vectorizer_errors-supporting-named-vectorizers branch May 16, 2025 08:37

github-actions bot mentioned this pull request May 16, 2025

chore(main): release pgai 0.10.4 #744

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add support for named vectorizers to ai.vectorizer_errors #740

feat: add support for named vectorizers to ai.vectorizer_errors #740

Uh oh!

smoya commented May 14, 2025 •

edited

Loading

Uh oh!

smoya May 14, 2025

Uh oh!

smoya May 14, 2025

Uh oh!

smoya May 14, 2025

Uh oh!

smoya May 14, 2025

Uh oh!

smoya May 14, 2025 •

edited

Loading

Uh oh!

Askir left a comment

Uh oh!

Askir May 15, 2025

Uh oh!

smoya May 15, 2025

Uh oh!

Askir May 15, 2025

Uh oh!

Askir May 15, 2025

Uh oh!

smoya May 15, 2025

Uh oh!

Uh oh!

Uh oh!

feat: add support for named vectorizers to ai.vectorizer_errors #740

feat: add support for named vectorizers to ai.vectorizer_errors #740

Uh oh!

Conversation

smoya commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smoya May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Askir left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

smoya commented May 14, 2025 •

edited

Loading

smoya May 14, 2025 •

edited

Loading