Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors

# 🌟 New model addition

## Model description

This is a version of EleutherAI's GPT-J with 6 billion parameters that is modified so you can generate and fine-tune the model in colab or equivalent desktop GPU (e.g. single 1080Ti).

The original GPT-J takes 22+ GB memory for float32 parameters. Even if you cast everything to 16-bit, it will still not fit onto most single-GPU setups short of A6000 and A100. You can inference it on TPU or CPUs, but fine-tuning is way more expensive.

## Implementation
Proof-of-concept notebook is available here: [![colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/1ft6wQU0BhqG5PRlwgaZJv2VukKKjU4Es?usp=sharing#scrollTo=P8Y75B6WDIN-)

[Model card](https://huggingface.co/hivemind/gpt-j-6B-8bit) has more detailed explanations and auxiliary notebooks (e.g. model conversion and perplexity check).

The current implementation is somewhat hacky, but it can be integrated easily with modelling_gptj.py if you like the idea.

## Open source status

* the model implementation is available [here](https://colab.research.google.com/drive/1ft6wQU0BhqG5PRlwgaZJv2VukKKjU4Es)
* the model weights are available [here](https://huggingface.co/hivemind/gpt-j-6B-8bit)
* who are the authors:
   - the [original GPT-J-6B](https://huggingface.co/EleutherAI/gpt-j-6B) was trained by [Eleuther AI](https://www.eleuther.ai/) (citation: Ben Wang and Aran Komatsuzaki)
   - fast quantization from [bitsandbytes](https://github.com/facebookresearch/bitsandbytes) by [Tim Dettmers](https://github.com/TimDettmers)
   - low-rank adapters were proposed for GPT-like models by [Hu et al (2021)](https://arxiv.org/abs/2106.09685)
   - this notebook was created by me ( @deniskamazur ) with some help from Yozh ( @justheuristic)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors #14839

🌟 New model addition

Model description

Implementation

Open source status

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors #14839

Description

🌟 New model addition

Model description

Implementation

Open source status

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions