-
Notifications
You must be signed in to change notification settings - Fork 30.3k
Open
Labels
Description
🌟 New model addition
Model description
This is a version of EleutherAI's GPT-J with 6 billion parameters that is modified so you can generate and fine-tune the model in colab or equivalent desktop GPU (e.g. single 1080Ti).
The original GPT-J takes 22+ GB memory for float32 parameters. Even if you cast everything to 16-bit, it will still not fit onto most single-GPU setups short of A6000 and A100. You can inference it on TPU or CPUs, but fine-tuning is way more expensive.
Implementation
Proof-of-concept notebook is available here:
Model card has more detailed explanations and auxiliary notebooks (e.g. model conversion and perplexity check).
The current implementation is somewhat hacky, but it can be integrated easily with modelling_gptj.py if you like the idea.
Open source status
- the model implementation is available here
- the model weights are available here
- who are the authors:
- the original GPT-J-6B was trained by Eleuther AI (citation: Ben Wang and Aran Komatsuzaki)
- fast quantization from bitsandbytes by Tim Dettmers
- low-rank adapters were proposed for GPT-like models by Hu et al (2021)
- this notebook was created by me ( @deniskamazur ) with some help from Yozh ( @justheuristic)
SenchoPens, gzomer, srulik-ben-david-hs, yunho-c, dhar174 and 1 morejustheuristic, stas00, TimDettmers, dvmazur, Ankur3107 and 23 more