-
Notifications
You must be signed in to change notification settings - Fork 17
Description
As in #612, this is a to-do list issue that organizes what to do for version 1.0
Edit: released 1.0.0 2024/05/12 since I'm using it for experiments and want a pinned version that I don't need to install with extra pip commands. Will continue working with this to-do list though
Add model abstractions
- Remove entry points for models, metrics, etc. #601
- I think we can do this right after merging Add abstractions to make it easier to declare and instantiate models #605, more or less?
- Models are already there without entry points, what we need is a replacement for the whole system of loading them through entry points and then mapping to a config
- CLN: Remove unused 'multiple model' functionality #538
- this will require removing the logic to load a config file again with an array (list) of model names taken from another table in the file and looping over them
- question is, what replaces that? Basically just a dict lookup on the models sub-package?
getattr(vak.models, 'SomeModel
)`? - as noted in this issue, changing config option
models
tomodel
singular is kind of a breaking change - so we might as well just introduce the new config format
- ENH: Add timing of training #2 need this for WIP so adding it here
Add dataset abstractions, to set us up to add actual datasets
- ENH: Have prep create a directory with standardized format for each prepared dataset #650
- ENH: Have prep generate learncurve splits ahead of time #651
- ENH: Run validate at end of prep, not start of train/learncurve, and include as "metadata" in dataset #649
- ENH: Add ability for prep to create datasets where the source for training samples $x$ is audio #652
Add models
- ENH: Add ability to run eval with and without post-processing transforms #621
- ENH: rename parameter
csv_path
->dataset_path
#549- do this in version 1.0 since it's a breaking change
- ENH: Convert post-processing to Transform classes that can be specified in config #537
- I think this can help with next issue, ENH: Make it as easy as possible to get predictions from a windowed frame classification model #604, as well as related issue ENH:
eval
should have option compute metrics with and without clean-ups #472
- I think this can help with next issue, ENH: Make it as easy as possible to get predictions from a windowed frame classification model #604, as well as related issue ENH:
- ENH: Add UMAP models and datasets #631
Do these all at once:
- DEV/ENH: Switch to using tomlkit #685 (do this before the next one)
- change config file format to leverage toml tables #345 --
vak.train.prep
,vak.train.model.TweetyNet
,vak.eval.prep
+vak.eval.model.TweetyNet
<-- actually write these up and test them - ENH: Add
dataset
sub-table to config file, remove other dataset/transform param keys #748
Then fix this bug:
Then do these in one fell swoop:
- Refactor frame classification models to use single
WindowedFramesDatapipe
#574 - ENH: Rename
datasets
topipes
, that have built-in transforms #724
Then finally do this:
-
BUG/CLN: Refactor model abstraction so we don't subclass LightningModel, to fix loss logging #737
-
ENH: Add AVA models and datasets #674 (finish!)
Add pre-trained models, improve fine tuning functionality:
-
Be able to refer to a model in config files with
model_path
option instead of needing to specifycheckpoint_path
,labelmap_path
, etc., ENH: Addmodel_path
option to config for eval + predict, use in place ofcheckpoint_path
,spect_scaler
, etc. #672 and Declare/formalize a model file format #673 -
ENH: Make it possible to specify different splits for datasets #749 -- we are set up to do this with built-in static dataset(s) but should also do it with datapipes for prep'd datasets
-
DOC: Finish documenting new model + family abstractions #616 -- we need to actually document this!
-
ENH: Add config / params dataclasses for high-level functions #679
-
add cli command that generates an empty config file #366 we want to do this after we make changes to API and add "params" classes
-
ENH: Add TCN models + DAS datasets #630 -- bumping these up the to-do list since they move us towards actually being a framework and force us to generalize functionality
-
ENH: Add object detection model family, models, and datasets #635
-
Refactor frame classification models to use single
WindowedFramesDatapipe
#574- It feels like we need to revisit this to decouple VocalizationDataset from transforms
- Will need to do this for ENH: Add TCN models + DAS datasets #630 anyway
-
ENH: Add
from_config_toml_path
classmethod tovak.models.base.Model
#615 (tentative) -
ENH: Make it as easy as possible to get predictions from a windowed frame classification model #604
- this is part of the reason for doing Add abstractions to make it easier to declare and instantiate models #605, to get to fixing this issue
-
ENH: Have default
spect_output_dir
beoutput_dir
notdata_dir
#573- Need to revisit if this is worth doing, if so it's a breaking change so should be new in 1.0
-
DOC: Fix tutorial config files after version 1.0.0 gets out of alpha #647
-
CLN: Ensure learncurve runs in ascending order of training set size #648
moved here from #612
- ENH: add dataclasses to represents results directories, e.g.
TrainResults
andLearncurveResults
#449- do this before next issue since it will make that one easier: we can use
TrainResults
to help automagically generate a config file foreval
andpredict
- do this before next issue since it will make that one easier: we can use
- add cli command that generates an empty config file #366
- DEP: use crowsetta 5.0.0 (post pyOpenSci review) #526
- ENH: Add LabelMap class #582
- TST: Add unit tests for transforms sub-package #585
- ENH: Add type annotations to all functions and classes #548
- ENH: Add app / logging with rich + textualize #613
- move
labeled_timebins
module to atransforms
sub-package, provide class versions #374- need to think about how doable this one is without breaking changes
- done in ENH: Add ability to run eval with and without post-processing transforms #621
- ENH: Rewrite
core.prep
to usevocles
+vocalpy
, removeio
module #558- in theory can do this w/out it being a breaking change, but will require actually developing these libraries more. so putting it near the end