ENH: Have prep create a directory with standardized format for each prepared dataset

Currently running `vak prep` always generates a set of spectrograms and a csv file representing a dataset.  
This issue proposes that `prep` instead create a directory with a standardized format.  

## Drawbacks of current approach

There are a few drawbacks to the current approach:
- moving files, e.g. to another computer, breaks all the paths in the csv, which currently are absolute paths
  - we could possibly fix this by writing them as relative paths but then we need to capture a notion of the "root" -- if we added another column to do this we'd repeat "root" needlessly, see next point
  - but there are multiple "semantics" for "paths" in the csv: for a spectrogram dataset, we have the `'audio_path'` column as a way to track provenance: what were the original audio files we generated the spectrograms from? (We also capture this info by using the same filename and adding an extension, but that filename doesn't include the path back to the original file)
- the tabular format of a csv file can't capture all the metadata we need about a dataset
  - e.g. we want to track the duration of a timebin, which we expect to be constant across all files, so it doesn't make sense to add it as a column to the csv
- there are other things we should be tracking as part of a dataset that we are currently not
  - e.g., for each dataset split in a learncurve, we generate vectors that represent valid windows in a WindowDataset--this abstraction lets us "crop" the dataset to a specified duration--but those vectors are put in the results; this has led us to add a `previous_run_path` option so we can re-run multiple experiments with the same dataset. We should instead just explicitly make these vectors part of the dataset (will raise a separate issue about making this change).
- because the prepared dataset is not in a directory with a standardized format, it's not easy to save and move datasets, for reasons above and also just because the files will be wherever they are, in `output_dir` or `spect_output_dir` etc

## Advantages of the new approach
In addition to fixing the issues just described, additional advantages of having prep make datasets as a directory with a standardized format are:
- we can map `Dataset` classes onto this directory format
  - The main strength of having a Dataset class map to a directory is that we can then prepare built-in datasets ahead of times as directories, and then download e.g. as a .tar.gz archive we then extract.
  - We can also change the format if/when required with less of an impact on a user.
    - for example not clear to me right now if there would be an advantage of moving to datasets that are all numpy arrays we can load in a memory-mapped way (like DAS does with Zarr arrays) 
- This also lets us better capture the notion of different *kinds* of datasets
  - e.g. for training UMAP models as in #631 we will need a `SegmentDataset`. Again if this is in a directory with a standardized structure it will just be easier to reason about. We can provide pre-generated datasets that follow Tim's notebooks that people can download as .tar.gz files

## Proposed dataset structure

An initial dataset format would look something like this
```console
dataset/
  train/
      song1.wav.npz
      song1.csv
      song2.wav.npz
      song2.csv
  val/
      song3.wav.npz
      song3.csv
  test/
      song4.wav.npz
      song4.csv
  dataset.csv
  # splits generated for learncurve
  traindur-30s-replicate-1.csv
  traindur-30s-replicate-1-source-id.npy
  traindur-30s-replicate-1-source-inds.npy
  traindur-30s-replicate-1-window-inds.npy
  config.toml  # config used to generate dataset
  prep.log  # log from run of prep
  meta.json  # any metadata
```

## Deprecations

We will need to deprecate the `spect_output_dir` option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: Have prep create a directory with standardized format for each prepared dataset #650

Drawbacks of current approach

Advantages of the new approach

Proposed dataset structure

Deprecations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ENH: Have prep create a directory with standardized format for each prepared dataset #650

Description

Drawbacks of current approach

Advantages of the new approach

Proposed dataset structure

Deprecations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions