Skip to content

[Frozen] Make Markdown's newline & space handling conform to web-platform-tests & Firefox's behavior #15081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

tats-u
Copy link
Contributor

@tats-u tats-u commented Jul 11, 2023

Description

Closes #16819

Waiting for:

Checklist

  • I’ve added tests to confirm my change works.
  • (If changing the API or CLI) I’ve documented the changes I’ve made (in the docs/ directory).
  • (If the change is user-facing) I’ve added my changes to changelog_unreleased/*/XXXX.md file following changelog_unreleased/TEMPLATE.md.
  • I’ve read the contributing guidelines.

Try the playground for this PR

@tats-u tats-u force-pushed the markdown-newline-html branch 3 times, most recently from 92276e4 to 01d93c6 Compare July 13, 2023 13:57
@tats-u tats-u marked this pull request as ready for review July 16, 2023 14:55
@tats-u tats-u force-pushed the markdown-newline-html branch 2 times, most recently from 894c31f to 01eb21b Compare July 22, 2023 11:54
@tats-u tats-u changed the title Make Makdown's newline handling conform to HTML Make Makdown's newline & space handling conform to HTML Jul 22, 2023
@tats-u tats-u force-pushed the markdown-newline-html branch from 24e63a7 to 65182d7 Compare July 22, 2023 14:50
@tats-u
Copy link
Contributor Author

tats-u commented Jul 22, 2023

Shoud I add changelog_unreleased/markdown/15081.md?

@fisker
Copy link
Member

fisker commented Jul 22, 2023

Can you send a separate PR for this first?

Don't trim spaces other than ASCII whitespace at the beginning or end of paragraphs

And we can use this module to do the html trim.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 22, 2023

@fisker In CommonMark spec, the form feed \f (U+000C) is in the same category as the full-width space (and other spaces in Zs).

https://spec.commonmark.org/0.30/#unicode-whitespace-character

Either way, it is destined to be removed at the stage where the converted HTML is processed, so it may as well be removed at the stage of conversion in Prettier.

https://codepen.io/tats-u/pen/bGQjrvM

I'll reuse that code.

Trailing form feeds don't seem always to be removed.

@fisker
Copy link
Member

fisker commented Jul 22, 2023

Interesting, https://bugs.webkit.org/show_bug.cgi?id=13159 seems we already made mistake in HTML/Handlebars printer.

@fisker
Copy link
Member

fisker commented Jul 22, 2023

Anyway, I think we should reuse htmlWhitespaceUtils, but consider remove \f in future if it really matters.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 23, 2023

@fisker Your change has just been applied.

@tats-u tats-u changed the title Make Makdown's newline & space handling conform to HTML Make Markdown's newline & space handling conform to HTML Jul 23, 2023
@tats-u tats-u force-pushed the markdown-newline-html branch 2 times, most recently from 90de55f to 304b1de Compare July 23, 2023 10:58
@tats-u
Copy link
Contributor Author

tats-u commented Jul 24, 2023

I found an interesting description in https://drafts.csswg.org/css-text-3/#line-break-transform

Then any remaining segment break (\n) is either transformed into a space (U+0020) or removed depending on the context before and after the break. The rules for this operation are UA-defined in this level.

NOTE: Historically, HTML and CSS have unconditionally converted segment breaks to spaces, which has prevented content authored in languages such as Chinese from being able to break lines within the source. Thus UA heuristics need to be conservative about where they discard segment breaks even as they strive to improve support for such languages.

What sucks is not the HTML or CSS specs but browsers' implementations.
I have to modify the change log and comments in the source code.

@tats-u tats-u changed the title Make Markdown's newline & space handling conform to HTML Make Markdown's newline & space handling conform to browsers Jul 27, 2023
if (isLink) {
return true;
}
function lineBreakCanBeConvertedToSpace() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should remove this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on whether we'll have to add it again.
I can't predict when all browsers will change their behavior.

@tats-u tats-u marked this pull request as draft July 29, 2023 03:12
Copy link
Member

@fisker fisker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result looks good, thank you!

@tats-u
Copy link
Contributor Author

tats-u commented Jul 29, 2023

@fisker I'll change the behavior.

I looked into the detailed behavior of Firefox.

Table: Is a newline between the followings converted to a space in Firefox?

\ Latin & Korean Chinese & Japanese & CJK punctuations
Latin & Korean Yes Yes
Chinese & Japanese & CJK punctuations Yes No

A newline between [Chinese & Japanese & CJK punctuations] should not be converted to a space.
Once it is removed, these

漢字
だよ

↓ OK

漢字だよ。

↓ STOP!

漢字
だよ。

This will improve the compatibility with the conventional behavior to some extend and reduce browser-defined behaviors.

A space around Chinese (& Japanese) is going not to be converted to a newline. It's not compatible with Firefox.

@fisker
Copy link
Member

fisker commented Jul 29, 2023

A newline between [Chinese & Japanese & CJK punctuations] should not be converted to a space.

Didn't get it.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 29, 2023

[Chinese & Japanese & CJK punctuations]

These seem to be treated as a single group in Firefox.
A newline surrounded only by that group is simply removed (in a view).
Of course, a space is added once copied and pasted on somewhere else.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 29, 2023

@fisker
Copy link
Member

fisker commented Jul 29, 2023

Should't we only need make sure the output equivalence in html?

Ah, maybe you are right, in markdown one single new line doesn't generate a space. But should not related to languages.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 29, 2023

@fisker

But should not related to languages.

Does it mean it's better to always treat a newline in Markdown as a space even if surrounded only by han or kana?
Indeed Markdown viewers other than browsers use Blink, but the behavior doesn't look natural as a Chinese or Japanese paragraph for me.
For the first place few people break lines in Chinese or Japanese sentences in Markdown documents, though.

@tats-u
Copy link
Contributor Author

tats-u commented Dec 4, 2023

@fisker @sosukesuzuki Recently I have noticed the plan for the release of v4.
How do you think about this change as a content in v4?

@fisker
Copy link
Member

fisker commented Dec 5, 2023

Current v4 is published for the new CLI test.

@tats-u tats-u marked this pull request as draft August 26, 2024 12:59
@tats-u
Copy link
Contributor Author

tats-u commented Aug 26, 2024

Current v4 is published for the new CLI test.

I see.

I'll create a new PR based on this, but reducing the change.
This PR will be frozen until it is merged and a new major version containing it is released.

This force push is just for following the latest content of main.


  • Make newline and whitespace interchangeable even between (Chinese or Japanese) and another character when proseWrap is always or never → frozen but no newline will be added between (Chinese or Japanese) and another character
  • Don't trim spaces other than ASCII whitespace at the beginning or end of paragraphs → kept

@tats-u
Copy link
Contributor Author

tats-u commented Sep 28, 2024

npx prettier --w is not so perfect to Chinese. It's not correct to add any space between Chinese words. It's recommended to add a space after en words, numbers, and punctuations

open-telemetry/opentelemetry.io#4341

I read this and decided to stop Prettier from wrapping lines even between Chinese / Japanese for a while.
We should face the reality of the broken and stale behavior of WebKit-based browsers rather than pursuing the ideal of WPT and Firefox's behavior.

#16691's title and behavior will be changed.

This PR will be kept opened to be worked on after the following browsers' bugs are fixed:

@tats-u tats-u changed the title Make Markdown's newline & space handling conform to web-platform-tests & Firefox's behavior [Frozen] Make Markdown's newline & space handling conform to web-platform-tests & Firefox's behavior Oct 6, 2024
@tats-u tats-u force-pushed the markdown-newline-html branch from e7ad613 to 10b2725 Compare October 27, 2024 15:37
@tats-u tats-u changed the base branch from main to next October 28, 2024 14:33
@tats-u tats-u force-pushed the markdown-newline-html branch from 10b2725 to a7d84bb Compare October 28, 2024 14:50
@tats-u tats-u force-pushed the markdown-newline-html branch from a7d84bb to 10b2725 Compare October 28, 2024 15:07
@tats-u tats-u changed the base branch from next to main October 28, 2024 15:08
@tats-u
Copy link
Contributor Author

tats-u commented Oct 28, 2024

I found #16691 hasn't been merged into next. It's dangerous to retarget at next now.

tats-u added a commit to tats-u/prettier that referenced this pull request Oct 29, 2024
tats-u added a commit to tats-u/prettier that referenced this pull request Nov 4, 2024
tats-u added a commit to tats-u/prettier that referenced this pull request Nov 4, 2024
@tats-u tats-u force-pushed the markdown-newline-html branch from 577f652 to feda80b Compare December 4, 2024 10:32
@tats-u
Copy link
Contributor Author

tats-u commented Dec 4, 2024

The 2nd last rebase (the last rebase will be nextmain) because of the merge of #16796
Duplicated commit 10b2725 with that was dropped
The last rebase will take place after when next merges main once again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Blocked by Chromium/WebKit] Re-enable line breaking between han
3 participants