Overly aggressive indexing strategy greatly increases datastore storage requirements

**CKAN version**
2.9.1

**Describe the bug**
The postgres datastore creates several indices when a resource is uploaded.

A unique index for the `_id` internal field, an FTS index using the `_full_text` internal field (which stores all the column values in one field for FTS searches, effectively doubling the width of the table), and **_one index per text column._**

Though this is great for ad-hoc queries using `datastore_search` and `datastore_search_sql`, and for interactive filtering on the UI, it greatly increases the datastore's storage requirements.

For example, a table with 1.37 million rows and 9 columns is:

- **60 mb** as a csv file
- **302 mb** for the database table
- **569 mb** for the 8 indices (unique index for `_id`, 1 fts index, and 6 indices for the 6 text columns. It did not create indices for the one timestamp and two numeric columns) 

I got this info while testing the [expanded `datastore_info` PR.](https://github.com/ckan/ckan/pull/5846)

**Expected behavior**
The team should consider giving finer granular control to the CKAN administrator beyond [specifying the index method](https://docs.ckan.org/en/latest/maintaining/configuration.html#ckan-datastore-default-fts-index-method).

Some indexing "knobs" to consider:
* turn on/off FTS index
* leverage the data dictionary to explicitly declare columns that should be indexed. As an added benefit, this should also allow indices for non-text columns
* or alternatively, instead of creating a mini DBA interface on CKAN (a non-trivial task), just read the postgres system catalog and  selectively show some indexing info on the data dictionary (leveraging the expanded `datastore_info`), and let the DB/CKAN administrator use robust tools like PgAdmin to manage datastore indices manually.  This is consistent with the keeping the Datastore API _["as thin as possible to allow you to use the features you would expect from a powerful database management system."](https://docs.ckan.org/en/2.9/maintaining/datastore.html#the-datastore-api)_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Overly aggressive indexing strategy greatly increases datastore storage requirements #5847

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Overly aggressive indexing strategy greatly increases datastore storage requirements #5847

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions