[Accuracy] [Online Quantization] Llama 1B FP16/FP8/W8A8_FP8 accuracy

## Conclusion
W8A8_FP8 quantization doesn't support online quantization


### GSM8K

#### Preparation
```bash
curl -o test.jsonl https://raw.githubusercontent.com/openai/grade-school-math/master/grade_school_math/data/test.jsonl

kubectl cp /Users/bhe/Desktop/oss/data/gsm8k/test.jsonl nfs_host:/shared/public/data/gsm8k/test.jsonl
```

FP16 Baseline:
```bash
python3 -m sglang.launch_server --model /shared/public/models/meta-llama/Llama-3.2-1B-Instruct --trust-remote-code

python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319
100%|████████████████████████████████████| 1319/1319 [00:10<00:00, 121.39it/s]
Accuracy: 0.396
Invalid: 0.003
Latency: 10.905 s
Output throughput: 11035.006 token/s
```


FP8
```bash
python3 -m sglang.launch_server --model /shared/public/models/meta-llama/Llama-3.2-1B-Instruct --quantization fp8 --trust-remote-code


python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319
100%|████████████████████████████████████| 1319/1319 [00:10<00:00, 129.00it/s]
Accuracy: 0.376
Invalid: 0.001
Latency: 10.270 s
Output throughput: 11708.710 token/s
```


W8A8_FP8
```bash
python3 -m sglang.launch_server --model /shared/public/models/meta-llama/Llama-3.2-1B-Instruct --quantization w8a_fp8 --trust-remote-code

python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319
100%|█████████████████████████████████████| 1319/1319 [00:38<00:00, 34.36it/s]
Accuracy: 0.003
Invalid: 0.284
Latency: 38.425 s
Output throughput: 17575.022 token/s
```

### MMLU to be added tonight

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Accuracy] [Online Quantization] Llama 1B FP16/FP8/W8A8_FP8 accuracy #4434

Conclusion

GSM8K

Preparation

MMLU to be added tonight

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Accuracy] [Online Quantization] Llama 1B FP16/FP8/W8A8_FP8 accuracy #4434

Description

Conclusion

GSM8K

Preparation

MMLU to be added tonight

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions