Skip to content

Conversation

tats-u
Copy link
Contributor

@tats-u tats-u commented Aug 28, 2024

Description

The safest part of #15081

Prettier removes non-ASCII whitespaces at the end of the line and beginning of the next line in Markdown, which violates the CommonMark spec.

The space (U+3000) in the following should be kept:

# 

 全角スペース全形空白

https://spec.commonmark.org/0.31.2/#soft-line-breaks

Spaces at the end of the line and beginning of the next line are removed:

https://spec.commonmark.org/0.31.2/#unicode-whitespace-character

A Unicode whitespace character is a character in the Unicode Zs general category, or a tab (U+0009), line feed (U+000A), form feed (U+000C), or carriage return (U+000D).

Unicode whitespace is a sequence of one or more Unicode whitespace characters.

A space is U+0020.

The CommonMark spec doesn't mention non-ASCII spaces here, so they should be reserved.

Checklist

  • I’ve added tests to confirm my change works.
  • (If changing the API or CLI) I’ve documented the changes I’ve made (in the docs/ directory).
  • (If the change is user-facing) I’ve added my changes to changelog_unreleased/*/XXXX.md file following changelog_unreleased/TEMPLATE.md.
  • I’ve read the contributing guidelines.

Try the playground for this PR

@tats-u tats-u force-pushed the preserve-unicode-spaces branch from ec07027 to 5c60b43 Compare August 31, 2024 01:31
@tats-u
Copy link
Contributor Author

tats-u commented Sep 1, 2024

https://spec.commonmark.org/dingus/?text=%E3%80%80%E5%85%A8%E8%A7%92%E7%A9%BA%E7%99%BD%E3%80%80%0A

Hm, the official dingus removes such spaces.
I'll have to ask the spec maintainers a question.

@tats-u
Copy link
Contributor Author

tats-u commented Sep 1, 2024

@tats-u
Copy link
Contributor Author

tats-u commented Sep 24, 2024

commonmark/commonmark.js#289

It was proven to be a commonmark.js's bug.

@tats-u tats-u changed the title Preserve non-ASCII whitespaces at the end of the line and beginning of the next line Markdown: preserve non-ASCII whitespaces at the end of the line and beginning of the next line Sep 24, 2024
@tats-u
Copy link
Contributor Author

tats-u commented Sep 24, 2024

@fisker do you approve the last change of the comment (in the last commit)?

@fisker fisker self-assigned this Sep 27, 2024
@fisker
Copy link
Member

fisker commented Sep 27, 2024

\u{xx} notation seems more readable. We can ignore \\u\{[0-9a-f]+\} in

"ignoreRegExpList": [

@tats-u
Copy link
Contributor Author

tats-u commented Sep 27, 2024

\u{xx} notation seems more readable

I agree with you, and I chose the way to silence cspell thanks to you.

@fisker fisker merged commit a4be6a0 into prettier:main Sep 27, 2024
29 checks passed
@tats-u tats-u deleted the preserve-unicode-spaces branch September 28, 2024 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants