Skip to content

Conversation

patil-suraj
Copy link
Contributor

What does this PR do?

Allows returning past_key_values from generate when use_cache=True.

Like other returned values, past_key_values are also returned as Tuple, one element per generated token.

Fixes #17016

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Comment on lines +2354 to +2355
if return_dict_in_generate:
past_key_values += (model_kwargs["past"],)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumes that if model_kwargs["past" is not None -> output_past_key_values == True

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Comment on lines +2384 to +2385
# past_key_values = model_kwargs["past"] if output_past_key_values else None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# past_key_values = model_kwargs["past"] if output_past_key_values else None

(probably forgotten :) )

@@ -1433,6 +1433,17 @@ def _check_outputs(self, output, input_ids, config, use_cache=False, num_return_
use_cache=use_cache,
)

# Past Key Value States
past_key_values = output.past_key_values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool to put it in every test here - good job!

@patrickvonplaten
Copy link
Contributor

We'll just need to fix the failing tests now :-) Think you'll have to overwrite this "checking" function in the respective individual test files

@dblakely
Copy link
Contributor

dblakely commented Jul 29, 2022

Hey there, sorry to nag, but any chance of moving this along? Anything I can do to help?

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@gante
Copy link
Member

gante commented Aug 25, 2022

(@patrickvonplaten @patil-suraj should I take over this PR? :) )

@patrickvonplaten
Copy link
Contributor

If ok for you @gante this would be amazing!

@huggingface huggingface deleted a comment from github-actions bot Sep 27, 2022
@gante gante added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Sep 27, 2022
@shunzh
Copy link

shunzh commented Oct 1, 2022

Hi, Thank you all for working on this feature! Is this going to be merged into the main branch soon?

@gante
Copy link
Member

gante commented Oct 3, 2022

@shunzh I haven't started working on it and it's hard to give estimates -- hopefully less than a month :)

@gilljon
Copy link

gilljon commented Jan 24, 2023

Was this closed because it's now possible to retrieve past_key_values or was there another reason?

@gante
Copy link
Member

gante commented Jan 24, 2023

@gilljon it is not closed :)

@gilljon
Copy link

gilljon commented Jan 24, 2023

@gante I'm sorry for the confusion! Any idea when it will be merged?

@sijunhe
Copy link
Contributor

sijunhe commented May 5, 2023

hi @gante . Any idea when this will be merged? Interested in using it and building something on top of it. I'll happy to put on the finishing touches if needed too!

@slyalin
Copy link

slyalin commented Jun 16, 2023

Hey! Just a friendly reminder. Any chance to get it merged soon?

@freckletonj
Copy link

freckletonj commented Oct 2, 2023

I would absolutely love this feature! This would open up so much for me, because I have prompts like:

prompt = '''
Stuff
* <generate X>
* <generate Y>

Stuff
You said [X], and [Y] previously, now:
* <generate Z>
'''

This is so expensive without past_key_values.

So this PR is now Merge-Conflicting, and I tried applying the patch but upon inspection, it's quite severely out of date now.

Is there another way to accomplish this?

I notice that model.forward typically allows to return past_key_values. But then I... have to make use of a sampling alg myself? Would this be the best way without needing upstream changes, and if so, how can I chain together model.forward and a sampler?

EDIT: IIUC, generation_utils is where model.generate comes from, so the new place to make these edits is:

@freckletonj
Copy link

Is this ticket dead because some other technique exists already for returning and reusing past_key_values? This is a killer feature.

@freckletonj
Copy link

The following PR is more up to date: #25086

@gante
Copy link
Member

gante commented Oct 18, 2023

(deprecated in favor of #25086)

@gante gante closed this Oct 18, 2023
@gante
Copy link
Member

gante commented Nov 2, 2023

Hey folks 👋

#25086 was merged.

If you install from main and add return_dict_in_generate=True to generate, past_key_values will be part of the output, assuming your model is configured with use_cache=True (the default).

You can then pass past_key_values to generate to continue generating!

@nevakrien
Copy link

I cant get it to work with intel neural_chat what vartion was this on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optionally return past key values from generate