Saving Keys and Values Cache at lower precision #6

Open

Open

Saving Keys and Values Cache at lower precision#6

opened

on Mar 19, 2023

Refer https://github.com/FMInference/FlexGen - they have explored storing cache at 4-bit quantization.

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests