Allow new lines during JSON generation #1277

qeternity · 2024-08-31T20:10:59Z

Motivation

By default, Outlines does not permit new lines and whitespace formatting during constrained JSON generation. Their rationale is that smaller models have a tendency to enter infinite generation loops otherwise. I have not observed this behavior with any recent smaller models (Llama 3.1 8B and Mistral 7B v3) rather we instead find material quality degradation, even with larger models (Llama 3.1 70B and Mistral Large 2) particularly when the JSON property is a list, and the number of elements is variable.

If you observe the natural generation behavior of these models, they emit syntactically formatted JSON and it strikes me a as a better default to permit such behavior.

Modifications

Modify the default Outlines whitespace regex to permit newlines and additional whitespace.

max99x · 2024-08-31T20:21:07Z

While this change doesn't impact the issue, one thing to note is that with non-constant whitespace like the regexes generated from schemas or pydantic models, the jump-ahead optimization ends up doing nothing. Having predefined spacing (pretty-printed or compressed) will run much faster. If that reduces quality/accuracy, instructing or fine-tuning models to produce either standard pretty-printed or compressed formatting can help.

qeternity · 2024-08-31T20:31:01Z

Is it truly the case that it would do nothing? It definitely seems that it would increase the number of inter-jump-ahead tokens but ultimately once the model has completed whitespace generation, it would still kick in? I have not dug into the implementation, and we are just now starting to evaluate the performance impact. Will report back with empirical results.

That said, I can strongly attest to the fact that restricted whitespace generation greatly reduces quality of larger SOTA models (we have tested basically everything short of L3 405). Ideally of course the jump-ahead would be pretty print aware, but at the very least is seems reasonable to expose a quality vs performance tradeoff to users. The alternative is to make this a per-request parameter, which seemed less desirable to us at first glance.

max99x · 2024-08-31T20:39:25Z

It's true that jump-ahead will still kick in for field names, etc., but it is common for whitespace to account for a large chunk of the generated tokens. I'm a user, not an sglang developer, so they may have more insight, but in our usage we've seen 30%+ speed increase from using fixed spacing in JSON regexes.

FWIW, for our use case we ended up ditching JSON generation and instead use a free-text user/assistant chat structure that asks for the relevant fields (sometimes in parallel forks) and composing the JSON from the output ourselves, and at least for smaller models we've gotten much better quality output that way.

qeternity · 2024-08-31T21:20:59Z

Yes, this very much I agree with and it's been well studied that JSON generation reduces overall quality (observed on 4o and 3.5 Sonnet). Constrained generation is still very much a trade off, but what we have found is that prompting to "build a list" to elicit data extraction followed by a constrained generation phase is the happy medium.

qeternity added 2 commits August 31, 2024 01:48

allow whitespace in constrained json

9eca6b9

Merge branch 'sgl-project:main' into json-fixes

bb5163d

Merge branch 'main' into json-fixes

8368af2

merrymercy approved these changes Sep 1, 2024

View reviewed changes

Merge branch 'main' into json-fixes

32be04d

merrymercy merged commit 32a4141 into sgl-project:main Sep 1, 2024
4 of 8 checks passed

qeternity added a commit to qeternity/sglang that referenced this pull request Sep 1, 2024

Allow new lines during JSON generation (sgl-project#1277)

304045d

zifeitong mentioned this pull request Sep 16, 2024

Add constrained_json_whitespace_pattern to ServerArgs #1438

Merged

3 tasks

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Allow new lines during JSON generation (sgl-project#1277)

ea02d02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow new lines during JSON generation #1277

Allow new lines during JSON generation #1277

Uh oh!

qeternity commented Aug 31, 2024

Uh oh!

max99x commented Aug 31, 2024

Uh oh!

qeternity commented Aug 31, 2024 •

edited

Loading

Uh oh!

max99x commented Aug 31, 2024

Uh oh!

qeternity commented Aug 31, 2024

Uh oh!

Uh oh!

Uh oh!

Allow new lines during JSON generation #1277

Allow new lines during JSON generation #1277

Uh oh!

Conversation

qeternity commented Aug 31, 2024

Motivation

Modifications

Uh oh!

max99x commented Aug 31, 2024

Uh oh!

qeternity commented Aug 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

max99x commented Aug 31, 2024

Uh oh!

qeternity commented Aug 31, 2024

Uh oh!

Uh oh!

Uh oh!

qeternity commented Aug 31, 2024 •

edited

Loading