You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Estimate data weights works off input_cfg yaml file formats, but call to parser assumes existence of auto populated categories. This leads to specifying for each cutset passed to lhotse to have information such as:
lang_field
text_field
shard_seed
shuffle
While the top two are alright to require per cutset, passing the bottom two per manifest collection creates redundant information that should only be managed by passing information from a training dataset config.
Describe the solution you'd like
Add default options to nemo_tarred and similar cutsets that autofill in this information so won't raise error when call the script. That or have autofill defaults to use for the estimation script (this is less likely since you want to keep it agnostic to the type of cutset used).
Additional context
Add any other context or screenshots about the feature request here.