Skip to content

Conversation

wardi
Copy link
Contributor

@wardi wardi commented Jan 15, 2024

Fixes #7971

Proposed fixes:

  • store per-plugin data in column comments
  • support validation for fields and partial updates
  • add a migrate cli command for migrating existing data dictionary fields

Features:

  • includes tests covering changes
  • includes updated documentation
  • includes user-visible changes
  • includes API changes
  • includes bugfix for possible backport

Please [X] all the boxes above that apply

if isinstance(f, dict) and f:
error_summary[_('Field %d') % i] = ', '.join(
v for vals in f.values() for v in vals)
return self.get(id, resource_id, data, errors, error_summary)

Check warning

Code scanning / CodeQL

Reflected server-side cross-site scripting

Cross-site scripting vulnerability due to a [user-provided value](1).
@wardi wardi force-pushed the 7971-data-dictionary-form branch from 4f6a2d4 to 56869e7 Compare January 16, 2024 03:09
@wardi
Copy link
Contributor Author

wardi commented Jan 16, 2024

@amercader 5fd00df moves the validator into datastore and exports it with IValidators

@wardi
Copy link
Contributor Author

wardi commented Jan 26, 2024

@amercader I've added a feature to allow validators to access the current values for plugin data. This covers making fields immutable, sticky or dependent on existing settings in any other way imaginable. Should be actually complete this time and will be part of the table designer PR soon (which is why I marked that one draft -- I want validation on my table designer data dictionary fields, and the ability to hide excess information from package_search results)

Copy link
Member

@amercader amercader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wardi I had a first look at this, overall looks good in the context of the DataStore / data dictionary feature, but I'm not sure how much of it can be copied or adapted to handle datasets, users, etc plugin_data. I'll give it a thought but left some comments in the meantime

self,
context: Context,
data_dict: dict[str, Any],
plugin_data: dict[int, dict[str, Any]]) -> Any:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if plugin_data should be included as part of the data_dict. That would align with how the actions that support plugin_data like package_create work* and not break the DataStore backend interface by adding a new parameter.

what do you think?

  • Just noticed that user_create uses plugin_extras instead 😭

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider that. AFAIK there are no active users of IDatastoreBackend, and I dislike function signatures that give no information about required parameters and their types. This kind of pattern across our code base makes it difficult to reason about code without reading the implementation, and encourages bad behaviours like passing the same context and data_dict through many layers of code and each layer often mutating the dict before returning.

So in this case I resisted :-) a better fix in my mind is to break out the current parameters but that was a larger change. I'm happy to add it to data_dict here if you think this will cause unnecessary breakage of existing code.

if i in plugin_data:
raw.update(plugin_data[i])

# ' ' prefix for data version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this means? I'm missing what this does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hack to version the column comment string.

Currently the un-validated field info json is stored as the column comment. This PR stores plugin_data there now, so to differentiate I added a space to the start of the comment. This lets the upgrade command detect when old info dicts are present as column comments so they can be upgraded to plugin_data. Using whitespace avoids having to parse the json or store an extra field in the json to record the version.


def update_datastore_info_field(
self, field: dict[str, Any], plugin_data: dict[str, Any]):
# expose all our non-secret plugin data in the field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, does this mean that if plugins don't implement this method no custom keys will be shown in the fields object of the datastore_info response?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes a plugin chooses how or whether to expose its data in the fields dict. Once case where it makes sense to hide the data stored is for statistics captured by automatic analysis that may be used for indexing but may be too large or rarely needed by users.

I did consider adding a parameter to datastore_info that could be passed through to update_datastore_info_field so that a user could request specific data to be included in the output. Let me know if you think that should be included here.

@wardi
Copy link
Contributor Author

wardi commented Jan 31, 2024

The pattern that could be shared with other entities is: plugin_data passed in the context so that validators (i.e. replacements for convert_to_extras/convert_from_extras) can access current values and provide new values to be stored in the DB. Using context['plugin_data'] keeps that data private and separate from user parameters.

The data dictionary case is more complicated than other entities because it's storing one set of plugin data per column so the plugin_data passed is a list (column) of dicts (plugin name) of dicts (actual data stored). For user, package etc the plugin_data would be a dict (plugin name) of dicts (actual data stored) instead.

wardi and others added 2 commits January 31, 2024 12:16
Co-authored-by: Adrià Mercader <amercadero@gmail.com>
@amercader
Copy link
Member

Thanks @wardi the new docs are great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🔌 plugin_data & IDataDictionaryForm for datastore data dictionary
2 participants