Estimate Data Weights Requires Redundant Metadata, Needs to be removed

**Is your feature request related to a problem? Please describe.**

Estimate data weights works off `input_cfg` yaml file formats, but call to parser assumes existence of auto populated categories. This leads to specifying for each cutset passed to lhotse to have information such as:

- lang_field
- text_field
- shard_seed
- shuffle

While the top two are alright to require per cutset, passing the bottom two per manifest collection creates redundant information that should only be managed by passing information from a training dataset config. 

**Describe the solution you'd like**

Add default options to `nemo_tarred` and similar cutsets that autofill in this information so won't raise error when call the script. That or have autofill defaults to use for the estimation script (this is less likely since you want to keep it agnostic to the type of cutset used).


**Additional context**

Add any other context or screenshots about the feature request here.

Current workaround

<img width="989" alt="Image" src="https://github.com/user-attachments/assets/d8c32e45-5fc5-4b2a-838e-34849b1d0fbd" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Estimate Data Weights Requires Redundant Metadata, Needs to be removed #13279

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Estimate Data Weights Requires Redundant Metadata, Needs to be removed #13279

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions