Skip to content

Documented training durations are too high #867

@derekhiggins

Description

@derekhiggins

the cli generate command up until a recent fix (#763 )
created a train_merlinite-7b-Q4_K_M_.....jsonl file with duplicate instructions, this meant that
instead of the default 100 instructions, training was happening on 1000's in instructions.

When training with 100 instructions as intended the process is significantly faster, we should update the docs to
represent this.

I've verified that following the colab notebook with 100 items in the trainset takes 5 minutes
and training on a Linux Server (not a laptop) with no GPU and enough RAM takes 50 minutes (although num epochs defaults to 1 vs 5 in the notebook)

These times are only from a single run through but indicate that training duration's are much lower when they were documented.

I havn't tested on a MAC or Kaggle but assume I can scale the number down also

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions