Skip to content

Spec: Table Designer Feature #6118

@wardi

Description

@wardi

This is a description of a new "Table Designer" feature for CKAN. Table Designer provides custom enforced data schemas that supporting partial updates, suitable for reference data or data that is regularly updated. Users can build a data schema interactively then load data using auto-generated Excel templates, with a web form or via the API.

Table Designer relies on the datastore extension and its default postgresql back end, so this feature could be implemented as a core extension like datastore, an improvement to the datastore extension or as a core change (though it's a little odd to have a core feature depend on an extension)

This work is based on the well-loved and battle-tested data schemas from the recombinant extension, scheming-style data forms and datatables data view from the canada extension. Instead of json/yaml schemas and one-table-per organization, Table Designer allows user-designed data schemas at the individual resource level.

A 3rd Way to Create CKAN Resources

Table Designer begins from the resource creation form. Instead of uploading a resource or providing a link, users select "Use Table Designer" to create a Table Designer resource. This creates a datastore-only resource with an empty datastore table with no columns defined, [future: or a set of columns copied from an existing resource].

Table Designer resources have a modified data dictionary form with no "Type Override" field (datapusher/xloader don't apply) but with a new "Type" drop-down, "Primary Key"/"Required" check boxes and a set of fields to control input validation. A new button at the bottom allows users to "Save and Add a new Column"

Table Designer Schemas

Schemas will be saved like other data dictionary fields, but with enforced data dictionary validation [to discuss: we need a way to hook this up]. Schemas may also be created with datastore_create instead of using the data dictionary form.

[future: A new endpoint is available to partially export these schemas as JSON Schema where column validation rules map cleanly to JSON Schema rules].

Built-in validation rules include enforcing values are from a controlled list (single choice or multiple choice stored as a normalized text array), values within a range, and values conforming to a set of postgresql data types: int, numeric, date, etc.

[future: New validation rules can be added with plugins implementing a new ITableDesigner interface. Validation rules need to be defined in pl/pgsql for enforced validation as well as in Excel formulas for interactive validation within generated Excel templates. Validation can be conditional between fields in the same row; this is a frequent requirement for evolving data schemas, e.g. the buyer name field in open.canada.ca's "contracts over $10K" data]

Validation is enforced with a trigger function created by Table Designer that reads the schema from the data dictionary and enforces them on a per-row basis. All errors in a row are returned together but only the errors for the first failing row is returned when more than one row is passed to datastore_upsert. This is sufficient to implement a web form with per-field error messages returned to the user, and is easy for API users to parse too.

Excel Templates

td-excel-errors

Many data catalogue users love working in Excel. Excel templates, automatically generated from the schema, help meet these users in their preferred environment. Excel templates include immediate interactive feedback showing errors (red in screenshot above) and missing values (blue in screenshot above) as users type or paste data from other systems.

td-reference

Reference information is generated in a second tab. All controlled list codes and column descriptions are available by clicking on any column heading.

Excel templates allow users to create and update ("upsert" based on records with matching primary key values) many records at once. Data uploaded with these templates is processed immediately, never stored on the server in Excel format. These templates are suitable for manually uploading a few thousand records at a time; the datastore_upsert API is recommended for larger numbers of records.

td-select

Existing records may be selected from a modified datatables view and downloaded in a pre-populated template by clicking "Edit in Excel". Users are encouraged to use this feature to update records instead of keeping copies of populated templates, in case they re-upload and accidentally revert someone else's changes.

Web Forms

td-form

A web form is generated from the schema for creating and editing records. This makes entering or correcting a small number of records possible without leaving the website. Existing records may be edited by clicking "Edit" on a row in a modified datatables view and records can be created by clicking "Create a Single Record".

[future: Custom form fields can be added with plugins implementing a new ITableDesigner interface. Snippets are used for form fields similar to the ones used in ckanext-scheming.]

Form validation is implemented by parsing errors returned from the datastore_upsert call and used to mark errors any errors on the individual fields.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions