-
Notifications
You must be signed in to change notification settings - Fork 2.1k
simpler package indexing, no system plugins #8396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -65,7 +65,7 @@ def load_environment(conf: Union[Config, CKANConfig]): | |||
warnings.filterwarnings('ignore', msg, sqlalchemy.exc.SAWarning) | |||
|
|||
# load all CKAN plugins | |||
p.load_all() | |||
p.load_all(force_update=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is force_update required here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_environment is used in tests which sometimes need to clear all the plugins by having an empty plugin list
def load( | ||
*plugins: str | ||
*plugins: str, | ||
force_update: bool = False, | ||
) -> Plugin | list[Plugin]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to document here what force_update is and why we needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added docstrings to load
and load_all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the PR @wardi ! I think it makes the codebase way more explicit and easier to follow when working on indexing.
It confuses me a little bit the both
parameter, I think it would be nice to document it and I also left some comments on some lines that I think could use a little bit more of explanation about what's going on.
@pdelboca thanks, how does it look now? |
Had a look at this as it directly relates to the work I'm doing on rethinking the search. This PR looks great but I find the |
For the current version we need to keep things as they are but see #8444 (comment) for some thoughts on data_dict/validated_data_dict going forward |
@amercader it's a significant performance hit on datasets with many resources to do all the package_show work twice, that's why I went with slightly uncomfortable "both" option. |
@wardi I understand how running package_show twice may affect performance, but if we only focus on validating the data dict, that needs to happen regardless before indexing, right? So if we call package_show once with use_default_schema=True and use_cache=False, and then on
Wouldn't that take roughly the same time? My concern other than the added complexity in
The I know context vars are frown upon but this seems like a reasonable thing to hide from the public API. And perhaps it would be the perfect time to introduce an explicit |
Validating first with the default schema and then with a custom schema isn't the same as validating only with the custom schema, so I couldn't use that approach. If package_show had a way of returning the raw "dictized" version of the package then we could do something like this.
That's a bug for sure :-)
If we go with a If we decide to drop the default schema version of data from the search index this all gets much simpler. Do you know if the primary consumer of this data version is the harvest extension or anything else? |
@amercader |
Thanks @wardi , I think this is a massive improvement and lays the foundation for a newer and better search |
Fixes #8395
Proposed fixes:
Features: