20gb VRAM requirement is excessive

### 🐛 Describe the bug

The default FP8 model runs perfectly fine on my 16gb GPU, sustaining 500+ token/sec for hours across batch jobs.

It just felt kind of silly having to edit 1 line in check.py (line 30) get it to run, instead of being able to use the default docker image.

### Versions

v0.2.1