Generate: return `past_key_values` #25086

gante · 2023-07-25T15:29:13Z

What does this PR do?

Enables returning past_key_values from generate, if return_dict_in_generate=True (otherwise only the generated input_ids are returned) and use_cache=True (otherwise there is no cache to return ;) ).

In more abstract terms, this enables features like:

continuing a given generation without having the more expensive prefill step -- like in multi-turn conversations
exploring the KV values without having to place a breakpoint in generate 👀 🐛

The added code for the feature is minimal, so most of the PR is docs and tests 🤗

Fixes #24841

HuggingFaceDocBuilderDev · 2023-07-25T15:53:19Z

The documentation is not available anymore as the PR was closed or merged.

freckletonj · 2023-10-10T21:11:38Z

This is a killer feature 👍

kazzand · 2023-10-19T09:56:27Z

@gante Hi! Thanks for PR!
Did you test feeding output past_key_values into .generate() method? Like take first 250 tokens input, run .generate(), get output past_key_values, take another 50 tokens input and run .generate() with previous 250 past_key_values? With beam search it seems to be kinda tricky. I'm trying to resolve multiple dimension mismatch problems.

amyeroberts

Thanks for adding this!

V. nice and clean PR and tests ❤️

amyeroberts · 2023-11-02T14:55:12Z

src/transformers/generation/utils.py

+                for layer in model_kwargs["past_key_values"]:
+                    layer_past_key_values = []
+                    for item in layer:
+                        layer_past_key_values.append(item[..., :-1, :])


Are we guaranteed to have a consistent item shape here?

Guaranteed consistent item shape? No. Guaranteed that the second to last dimension is the sequence length? Yes :)

Some models squash together or permutate the first two dimensions (by default, the cache is (batch_size, num_heads, sequence_length, embed_size_per_head)), Bloom and GPTBigCode aka StarCoder being the biggest offenders.

One of the tests I added is in the mixin and touches contrastive search, so the fact that this output can be correctly used for continuations is tested :D In fact, I only noticed that this was a problem precisely because of the test!

amyeroberts · 2023-11-02T14:57:56Z

tests/generation/test_utils.py

+            # 2. generate to max length (which can be achieved by setting the eos token to an invalid value), which
+            #    would make the test flaky (e.g. EOS is generated on iteration 1 on both generations, but the
+            #    continuation would force it to generate beyond an EOS token)


Sorry, I don't understand this comment 😅 I parse it as generating to the max length to make the test flaky.

Yes, correct -- it would make the test failure a false positive, i.e. failing on expected behavior.

amyeroberts · 2023-11-02T15:03:03Z

tests/generation/test_utils.py

+            # If "past_key_values" is not returned, pass the test (e.g. RWKV uses a different cache name and format)
+            outputs = model(**inputs)
+            if "past_key_values" not in outputs:
+                return


I think that rather than have model specific cases when we skip by early returning, which looks like "pass" on pytest, it would be clearer for the test to assume past_key_values exist and then have each model individually handle it within their test files. This way each model can either use an explicit skip reason with unittest.skip or implement their model equivalent test. This is opinion though - would be good to have thoughts from @ydshieh

I 100% agree!

Can I make it part of a follow-up PR? The same pattern is used in other places, and I am not sure how many models will break (there should be a few)

Yes, of course :D

we can use self.skipTest (but generally not a good useage neither), especially here we are in a loop

for model_class

Also, better to use continue (compared to return) here.

Potentailly, using

with self.subTest(....)

is a better approach I guess.

amyeroberts

Sorry - I meant to approve before!

gante · 2023-11-02T15:39:12Z

(merging and leaving the conversion to a skip as a TODO)

nevakrien · 2023-11-30T14:38:04Z

I dont see a version number when will this be out?

gante · 2023-11-30T14:52:24Z

Next release :) (v4.36)

amyeroberts · 2023-12-11T16:05:45Z

@nevakrien 4.36v is now out :)

lqf96 mentioned this pull request Jul 27, 2023

Support for caching prompt hidden states through multiple calls of generate() #24841

Closed

huggingface deleted a comment from github-actions bot Aug 25, 2023

huggingface deleted a comment from github-actions bot Sep 20, 2023

ArthurZucker changed the title ~~Generate: return past_key_values~~ [WIP] Generate: return past_key_values Sep 20, 2023

freckletonj mentioned this pull request Oct 10, 2023

[generate] return past_key_values #17574

Closed

freckletonj mentioned this pull request Oct 10, 2023

Generate: New Cache abstraction and Attention Sinks support #26681

Merged

5 tasks

freckletonj mentioned this pull request Oct 22, 2023

Request: smaller dependency surface on Transformers model types? guidance-ai/guidance#408

Closed

1st commit, let's see what breaks

0ddfdfb

gante force-pushed the return_past_kv branch from e4d5b46 to 0ddfdfb Compare October 27, 2023 12:55

gante added 15 commits October 27, 2023 13:44

no need to accumulate

68bb3ae

fix docstrings, correct test

c408f62

make fixup

17364b8

typo

e8aaa56

fix most cases

5b8fe64

fix case for contrastive search

e27a1f9

some models always return past kv

e70f73f

correct test for seq2seq

4b58cce

now truly fix seq2seq

d5b34d3

fix final tests?

e89a1dd

now all working?

1088c58

add mixin test to test continuing generation

e9893d7

more versatile input code

b3cbbcd

fix most models

668986c

test works on nearly all models

70b5f8d

gante marked this pull request as ready for review October 31, 2023 12:16

gante requested a review from amyeroberts October 31, 2023 12:16

gante changed the title ~~[WIP] Generate: return past_key_values~~ Generate: return past_key_values Oct 31, 2023

amyeroberts reviewed Nov 2, 2023

View reviewed changes

amyeroberts approved these changes Nov 2, 2023

View reviewed changes

gante merged commit a6c82d4 into huggingface:main Nov 2, 2023

gante deleted the return_past_kv branch November 2, 2023 15:39

gante mentioned this pull request Nov 2, 2023

Optionally return past key values from generate #17016

Closed

freckletonj mentioned this pull request Nov 2, 2023

Feature Request: past_key_values landed in transformers, and could speed up generations eth-sri/lmql#263

Open

gante mentioned this pull request Nov 3, 2023

Generate: skip tests on unsupported models instead of passing #27265

Merged

EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023

Generate: return past_key_values (huggingface#25086)

b109d8a

gante mentioned this pull request Nov 20, 2023

Generate: Update docs regarding reusing past_key_values in generate #27612

Merged

Generate: return past_key_values #25086

Generate: return past_key_values #25086

Conversation

gante commented Jul 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jul 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

freckletonj commented Oct 10, 2023

Uh oh!

kazzand commented Oct 19, 2023

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts Nov 2, 2023

Choose a reason for hiding this comment

Uh oh!

gante Nov 2, 2023

Choose a reason for hiding this comment

Uh oh!

amyeroberts Nov 2, 2023

Choose a reason for hiding this comment

Uh oh!

gante Nov 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amyeroberts Nov 2, 2023

Choose a reason for hiding this comment

Uh oh!

gante Nov 2, 2023

Choose a reason for hiding this comment

Uh oh!

amyeroberts Nov 2, 2023

Choose a reason for hiding this comment

Uh oh!

ydshieh Nov 2, 2023

Choose a reason for hiding this comment

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

gante commented Nov 2, 2023

Uh oh!

nevakrien commented Nov 30, 2023

Uh oh!

gante commented Nov 30, 2023

Uh oh!

amyeroberts commented Dec 11, 2023

Uh oh!

Uh oh!

Generate: return `past_key_values` #25086

Generate: return `past_key_values` #25086

gante commented Jul 25, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 25, 2023 •

edited

Loading

gante Nov 2, 2023 •

edited

Loading