Skip to content

configurable datastore FTS indexes #8530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 29, 2024
Merged

configurable datastore FTS indexes #8530

merged 10 commits into from
Nov 29, 2024

Conversation

wardi
Copy link
Contributor

@wardi wardi commented Nov 14, 2024

Fixes #5847

Proposed fixes:

Disable automatic FTS indexes of text fields in the datastore through a new configuration option

Features:

  • includes tests covering changes
  • includes updated documentation
  • includes user-visible changes
  • includes API changes
  • includes bugfix for possible backport

@amercader
Copy link
Member

This looks good @wardi

IIRC there needs to be a similar PR for backporting to 2.11 (and 2.10?) with the default value ckan.datastore.default_fts_index_field_types inverted to keep compatibility right?

@wardi
Copy link
Contributor Author

wardi commented Nov 26, 2024

@amercader yes, I will create those separately. I also want to have a cli command to update the FTS indexes based on the configuration option so users have an easy way to reclaim space on an existing installation.

@amercader
Copy link
Member

@wardi is the CLI command meant to be in this PR or can I merge it?

@wardi wardi marked this pull request as draft November 26, 2024 17:04
@wardi
Copy link
Contributor Author

wardi commented Nov 26, 2024

bumped back to draft so I can finish the CLI

@wardi wardi marked this pull request as ready for review November 27, 2024 15:51
@wardi
Copy link
Contributor Author

wardi commented Nov 27, 2024

@amercader ready for review again, will work on 2.11 backport

@wardi wardi mentioned this pull request Nov 27, 2024
5 tasks

for i, resid in enumerate(get_all_resources_ids_in_datastore(), 1):
print(f'\r{resid} [{i}/{len(resource_ids)}] ...', end='')
logic.get_action('datastore_create')(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do anything now but it feels a bit uneasy to call datastore_create without params in order to trigger the indexes updating, as this will also do anything that a datastore_create call does. Maybe now that is just the indexes but who knows if something else is added in the future. A specific datastore_update_indexes call would be safer but again, we can improve this in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also uneasy with this, except that creating indexes is part of what datastore_create does automatically, so I don't know how you would move that functionality to a separate call without breaking the API.

@amercader amercader merged commit 958b80b into master Nov 29, 2024
9 checks passed
@amercader amercader deleted the 5847-selective-indexes branch November 29, 2024 10:05
amercader added a commit that referenced this pull request Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Overly aggressive indexing strategy greatly increases datastore storage requirements
2 participants