Skip to content

Conversation

Shruti157
Copy link
Contributor

Fix: Preserve empty block literals (|-) instead of converting to empty strings

Description

This PR fixes the issue where empty block literals using the |- chomping indicator are incorrectly stringified as quoted empty strings ('') instead of retaining their original block literal formatting.

Currently, this input:

input: |-   

is stringified as:

input: ''

Related Issue

Fixes #630


What’s the Fix?

The following condition in blockString() was:

if (!blockQuote || /\n[\t ]+$/.test(value) || /^\s*$/.test(value)) {
  return quotedString(value, ctx)
}

if (!value) return literal ? '|\n' : '>\n'

This logic caused empty or whitespace-only block scalars to fallback to quoted strings (''), even when the user clearly wrote a |- block scalar.

Updated To:

if (!blockQuote || /\n[\t ]+$/.test(value)) {
  return quotedString(value, ctx)
}

if (!value) return literal ? '|-\n' : '>\n'

Key Change:

  • Removed check for ^\s*$ which incorrectly matched valid empty block literals.
  • Changed fallback to return |- instead of | to match typical YAML idioms and avoid unexpected trailing newlines.

Result

With this fix, the following roundtrip works as expected:

input: |-   

After parse → stringify:

input: |-   

Block literal preserved


Additional Notes

  • Tested against YAML samples including PEM-style certificates and template files.

Copy link
Owner

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks decent, though your PR description is very LLM-ish in its verbosity.

Please add tests to cover your change, and see the inline comment.

@@ -201,7 +201,7 @@ function blockString(
: type === Scalar.BLOCK_LITERAL
? true
: !lineLengthOverLimit(value, lineWidth, indent.length)
if (!value) return literal ? '|\n' : '>\n'
if (!value) return literal ? '|-\n' : '>\n'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Given how both | and |- with only whitespace parse as the empty string, there's no need for the - indicator.

@Shruti157
Copy link
Contributor Author

Yes, from a parsing perspective both | and |- parse to the same empty string.
However, this change ensures that if a user writes input: |-, that exact formatting gets preserved when stringified back. Currently |- becomes |, which changes their original formatting.
The - indicator shows explicit intent to strip trailing newlines. Even for empty values, preserving this keeps the original meaning intact.
This makes behavior consistent with non-empty block literals where chomping indicators are correctly preserved.

Would you suggest a different approach for handling this type of empty block literal case?

@eemeli
Copy link
Owner

eemeli commented Jul 27, 2025

However, this change ensures that if a user writes input: |-, that exact formatting gets preserved when stringified back. Currently |- becomes |, which changes their original formatting.

When does |- currently become |? Isn't the issue that both of these are re-serialized as ""?

And doesn't your proposed fix make it so that an empty | would become |-?

@rakumar01
Copy link

rakumar01 commented Jul 28, 2025

Thanks for the follow-up and for confirming the current behavior.

What we're trying to fix

Today: Both | and |- on empty block literals parse to "", then stringify back to either "" or an empty plain scalar, causing the original chomping intent to disappear.

Goal: Round-tripping (parse → stringify) should preserve the user's exact syntax, including trailing - when present.

Your question: "When does |- currently become |? Isn't the issue that both are re-serialized as ""?"

Our initial approaches

1. Removing the empty-value test

// original
if (!blockQuote || /\n[\t ]+$/.test(value) || /^\s*$/.test(value))
// attempt  
if (!blockQuote || /\n[\t ]+$/.test(value))

keeping:

if (!value) return literal ? '|\n' : '>\n'

Result: |- would still stringify to |, losing the -.

2. Always defaulting to - for empties

if (!value) return literal ? '|-\n' : '>-\n'

Result: |- is preserved but a plain empty | now becomes |- which wasn't expected.

Refined solution for this PR

Parser stores the original block indicator:

// Parser (compose-scalar.js):
if (token.type === 'block-scalar') {
  scalar.blockIndicator = token.props?.find(p => typeof p.source === 'string')?.source ?? null;
}

Stringifier uses that to decide the chomp on empty values:

// Stringifier (blockString function):
function blockString({type, value, blockIndicator}, ctx, ) {
  // Extract chomp from blockIndicator
    let originalChomp = '';
    if (blockIndicator) {
        const chompMatch = blockIndicator.match(/[|>](\d*)([+-]?)$/);
        originalChomp = chompMatch ? chompMatch[2] : '';
    }
   if (!value) {
       if (originalChomp)  {         // |-, |+
    return literal ? `|${originalChomp}\n` : `>${originalChomp}\n`
        }
  return literal ? '|\n' : '>\n'   // plain | or >  // Fallback to original behavior
}

  
}

Behavior:

  • Input |- → output |-
  • Input | → output |
  • No default - is injected

With the new fallback to |/>, empty literals now stringify exactly as they did before unless the user explicitly wrote |- or |+. Let me know if you'd prefer a different default, I think this resolves the prior issue, but f you have any alternative approaches in mind that might be cleaner or more aligned with the library's architecture, Please let us know and If you're good with this approach, we'll update the fix in the PR.

@eemeli
Copy link
Owner

eemeli commented Jul 28, 2025

I continue to have a really strong sense that I'm interacting here with an LLM here, and I'd really rather not. In fact, this PR inspired me to add this to the contribution guide:

If you use an LLM when writing pull requests, this must be explicitly declared in the pull request. If there is indication of undeclared LLM assistance, the pull request will be declined.

With that in mind, please clarify if you're using an LLM to write your comments. And going forward, I'd be much more willing to consider your ideas if you represent them directly (even if it's in bad English, I honestly don't mind) rather than if I have to presume they've been generated by an LLM.

@rakumar01
Copy link

We're kind of block on this issue and not very versed with the library, so we have been quite detailed in explaining our requirements in the PR, which might come across as Llmish. We're looking for suggestions on how to move forward. Going forward, we'll keep our explanations more concise.

@rakumar01
Copy link

We have spent the last three days trying to understand the library to provide a fix and validated test case scenarios for this PR. If this PR is being rejected mainly due to the tone or what seems like LLM-ish verbosity, please do let us know. It seems that the style of the explanation is getting more attention than the actual problem, especially when it is blocking us. We are blocked due to an issue in the library which we had already explained and want to unblock ourselves with the provided fix. If the fix itself needs work, we are happy to discuss and improve it. If it's mainly about the writing and the tone, let us know so we can revise and resubmit the PR with a more natural tone to avoid rejection over verbosity.

@eemeli
Copy link
Owner

eemeli commented Jul 30, 2025

I'm supportive of keeping a block literal as a block literal, even if it's empty.

I'm not convinced that there's sufficient value in keeping track of what block chomping indicator was used (esp. as we're also not keeping track of the block indentation indicator). On these, the spec specifically mentions:

A YAML processor should only emit an explicit indentation indicator for cases where detection will fail.
[...]
The chomping method used is a presentation detail and must not be used to convey content information.

And so since the - is not needed to keep trailing whitespace out of an empty block scalar, I don't think it ought to be included in the output.

The Document level API of this library pretty explicitly does not retain all of the details of the source, and so needs to leave out some things. The chomping indicator used on an empty block literal is such a detail. Note though that the CST API does support such details, and is provided for users with a need to control things at such a fine-grained level. Would that provide a solution for you?

@rakumar01
Copy link

Thanks for the insights and clarification on the Document level API limitations. We will revise our code to remove the - indicator from the condition while keeping the block literal format. We will also explore the CST API for finer control.

@Shruti157
Copy link
Contributor Author

we have removed the '-' indicator from empty block literal as discussed. The block literal format is still retained. Please review.

@eemeli
Copy link
Owner

eemeli commented Jul 31, 2025

Yeah, that should work. Please add tests to verify that the behaviour is as expected.

@Shruti157
Copy link
Contributor Author

Hi @eemeli,

I have added test cases to verify the behavior with my changes.

@Shruti157
Copy link
Contributor Author

Changes have been incorporated. Please review.

Copy link
Owner

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. Not 100% happy about the trailing \n\n for top-level block scalars, but that's not really that significant.

@eemeli eemeli merged commit de8a0ab into eemeli:main Aug 5, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

For an empty multi line input like |- is getting replace with ''
3 participants