-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Since the introduction of the datastore data dictionary feature, uses for storing different types of data per-column are multiplying. The original feature included:
- label and notes text fields
- type override drop-down for the next datapusher/xloader run
preface
plugins can add their own fields to the data dictionary by overriding the template and users can pass any json values they like using the datastore_create
API. e.g. open.canada.ca adds:
the upcoming #7882 table designer feature hides the type override field and adds:
- hidden tdtype value
- minimum, maximum constraints
- regex pattern
- choices
- etc.
https://github.com/dathere/datapusher-plus loads data into the datastore and needs a place to add qsv column statistics like:
- detected type
- data range
- frequencies and values
- etc.
We're also working on a feature to allow column reordering which will need a value stored like:
- column weight/order value
problems
-
All of these values would have to share the same
info
dict in each field and there's no way to prevent values from being overwritten or deleted when one plugin updates its values and is unaware of the others. -
Any values stored are always returned by
datastore_search
anddatastore_info
bloating the response and preventing storing too much data (e.g. frequency tables) this way -
No validation is possible for fields sent to
info
. Users can create any key they like, with any type regardless of the intended use
proposal
Let's divide the column description json into per-plugin values like we have for plugin_data
fields. e.g:
top level key | second level keys | description |
---|---|---|
ckan |
label , notes |
standard human-friendly name+description of column |
ckan |
order |
override for default ordering of columns |
datapusher |
type_override |
import type for next datapusher/xloader run |
canada |
label_fr , notes_fr |
French versions of standard fields our site needs* |
tabledesigner |
tdtype , choices , minumum , ... |
table designer config |
statistics |
type , minumum , maximum , frequency , ... |
datapusher+/qsv statistics |
* this could be generalized for other sites/languages
Now each plugin has its own namespace for field information.
IDataDictionaryForm
Next we can use custom validation rules to parse field info
values passed to datastore_create
, clean and possibly return validation errors, then store actual values in the per-plugin column description json as shown above. These validators can choose whether to clear out values not passed so values like statistics can be kept when e.g. only the label or notes values are being changed.
Each plugin can extend the info
schema with its own validators and all validators would apply to all field info
values passed. The resource and column description objects are available in the validators' context so that a validator can e.g. only run if the url_type
applies (table designer config, datapusher type override) and store/update/remove values in the per-plugin description object.
When returning field info values from datastore_search
or datastore_info
plugins provide a method that converts from the per-plugin column description json values back to a flat info
dict to maintain backwards compatibility. We can add an option to datastore_info
so users could request specific data that isn't output by default (e.g. statistics data or column order values) and the plugin method could populate info
with those additional values.
implementation details
Performance of datastore_search
should not be negatively affected, so we'll need to cache the info
values returned and not loop through plugins parsing and generating info
json on every call. A simple way to do this would be to cache values as a new top level key in the same column description json and have PostgreSQL return only that value for the columns returned.