Skip to content

💺🥩🍷 First-class Files #8197

@wardi

Description

@wardi

Discussed in #8081

Originally posted by wardi February 16, 2024
Uploaded files in CKAN are limited to 0 or 1 file attached to only groups or resources.

The group or resource model stores a reference to the file with a plain text column that can be updated like other metadata values. Resources can store the length, hash and format of a file uploaded, but these are metadata fields free for users to update (or not) that aren't durably linked to the file itself.

Uploaded files can leak, staying on the underlying storage and costing money even though there is no longer any way to reach them from the CKAN site.

There is no shared way to represent files that aren't yet attached to a group or resource, e.g:

It's not possible to attach multiple files to a resource even when they represent the same data. This would be very useful for:

Model solution

Let's create a model for uploaded files in CKAN that can be linked to resources or groups or anything else that a site might need.

Files would have:

  • owner type + id for permissions (e.g. resource, user, group, etc.)
  • original file name
  • file reference (specific to storage back end)
  • total size in bytes
  • format detected or determined from file name
  • completion state (ranges received for background/parallel uploads)
  • hash(es) (when supported by back end)

Other possibilities:

  • name of back end (multiple back end support or for migrating files live)
  • support for "files" that are actually links to externally managed resources so we can monitor changes to content based on hash/size when retrieved
  • alternate links for redundancy when some services aren't available
  • custom fields for permissions, tracking, validation reports or other plugin data

This model would make file metadata reliable, allow us to build new features and potentially save people money by better tracking hosted data in CKAN.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions