Resource-first "Add Dataset" workflow #6689
-
In the implementation I'm currently working on, the client did extensive user research, and did a survey of several CKAN portals. One of their main findings is that default CKAN "Add Dataset" workflow is not ideal, especially when you have a lot of package metadata (customized through scheming). They would like to upload the canonical resource first (they mainly have only one resource per package), and then start populating the package metadata and the data dictionary upfront. As the data dictionary is inferred by datapusher/xloader asynchronously (with not so bulletproof inference of messytables), this is currently not possible with the current workflow. Are there any other CKAN installations that implemented a resource-first "Add Dataset" workflow? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Some initial thoughts on approaches: 1. separate application for resource uploads firstWe discussed this briefly at the dev call today and @amercader mentioned some related work that serves resource editing pages through a separate JS application that supports multiple uploads. The same approach would be possible for ebut with a separate application that comes before the dataset creation page. This separate application would:
Reorder the dataset editing process on the ckan side to allow editing metadata after customizing the data dictionary, and publish the dataset by removing the draft setting. 2. new flask view for resource uploads firstDo everything listed in approach 1 as part of a ckan extension instead of as a separate application, keeping the code together with the ckan dataset editing workflow changes 3. pre-create dataset, monitor xloader/datapusher progressWhen user clicks to create a new dataset:
After resource creation send user to a page that will show the progress of the upload and xloader/datapusher for this resource. Reorder the dataset editing process on the ckan side to allow editing metadata after customizing the data dictionary, and publish the dataset by removing the draft setting. 4. as above but with new background jobDo everything in approach 3 but replace/extend datapusher/xloader using a tool like https://github.com/jqnatividad/qsv to analyze the columns before loading into the datastore |
Beta Was this translation helpful? Give feedback.
-
See #6869 for a follow-up discussion... |
Beta Was this translation helpful? Give feedback.
Some initial thoughts on approaches:
1. separate application for resource uploads first
We discussed this briefly at the dev call today and @amercader mentioned some related work that serves resource editing pages through a separate JS application that supports multiple uploads. The same approach would be possible for ebut with a separate application that comes before the dataset creation page.
This separate application would:
pacakge_create
to create a draft dataset withid
andname
se…