Skip to content

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors #14839

@dvmazur

Description

@dvmazur

🌟 New model addition

Model description

This is a version of EleutherAI's GPT-J with 6 billion parameters that is modified so you can generate and fine-tune the model in colab or equivalent desktop GPU (e.g. single 1080Ti).

The original GPT-J takes 22+ GB memory for float32 parameters. Even if you cast everything to 16-bit, it will still not fit onto most single-GPU setups short of A6000 and A100. You can inference it on TPU or CPUs, but fine-tuning is way more expensive.

Implementation

Proof-of-concept notebook is available here: colab

Model card has more detailed explanations and auxiliary notebooks (e.g. model conversion and perplexity check).

The current implementation is somewhat hacky, but it can be integrated easily with modelling_gptj.py if you like the idea.

Open source status

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions