Skip to content

Conversation

smoya
Copy link
Contributor

@smoya smoya commented May 14, 2025

This is an improved version of #683, which was reverted.
This PR considers retrocompatibility with previous versions that have ai.vectorizer_errors as table.

@smoya smoya requested a review from a team as a code owner May 14, 2025 16:18
@smoya smoya temporarily deployed to internal-contributors May 14, 2025 16:18 — with GitHub Actions Inactive
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are the changes introduced compared to #683

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are the changes introduced compared to #683

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are the changes introduced compared to #683

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are the changes introduced compared to #683

This test was created in #734

@@ -212,6 +212,18 @@ def migrate_config_to_new_version(cls, data: Any) -> Any:
logger.warning("Unable to migrate configuration: raw data type is unknown")
return data # type: ignore[reportUnknownVariableType]

def ensure_features(self, features: Features):
Copy link
Contributor Author

@smoya smoya May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea is to have this method in the Vectorizer rather than adding the logic to the Worker directly, as I believe it is really responsibility of the first.

However, I could revert this and move the logic to the worker here if you believe it is not clear/explicit enough

Copy link
Contributor

@Askir Askir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment on lines +67 to +75
# Newer versions of pgai lib have the ai.vectorizer_errors view.
# The table has been renamed to ai._vectorizer_errors
query = """
SELECT table_name
FROM information_schema.views
WHERE table_schema = 'ai' AND table_name = 'vectorizer_errors';
"""
cur.execute(query)
has_vectorizer_errors_view = cur.fetchone() is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've probably checked if the _vectorizer_errors table exists and not if vectorizer_errors is a view. But I guess this works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any benefit from that approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just one step closer to what the vectorizer actually needs.
It doesn't really care about the view it only needs to know which table it should insert the errors into. But effectively your are checking this since the view and _ table are created in the same migration.

Comment on lines 216 to 225
"""
Ensure the vectorizer is compatible with all possible features.
This modifies the vectorizer to make it compatible with the max
amount of features as possible.
"""
if (
self.errors_table == DEFAULT_VECTORIZER_ERRORS_TABLE
and not features.has_vectorizer_errors_view
):
self.errors_table = "vectorizer_errors"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't really know why we configure this on the vectorizer in the first place and not on the executor. The vectorizer doesn't really use the errors table in any way its only useful for the processing. So that could just have a switch flip on insert.

But I guess this works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, don't want to early optimize. I moved back that logic to the worker.

@smoya smoya temporarily deployed to internal-contributors May 15, 2025 10:03 — with GitHub Actions Inactive
@smoya smoya requested a review from Askir May 15, 2025 10:05
@smoya smoya merged commit c1d13f4 into main May 16, 2025
14 checks passed
@smoya smoya deleted the smoya/ai-675-vectorizer_errors-supporting-named-vectorizers branch May 16, 2025 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants