Block API: Consider encoding-normalized text as equivalent #11771
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #9906
This pull request seeks to improve the block validation step to allow more leniency for effectively equivalent text encoded in varying forms.
The changes here were authored in such a way where there may be a slight performance benefit over master, both in a reduction of bundle size (an approximate 18% reduction gzipped on the
blocks
module) and in optimizing for an early return of equality if normalization (whitespace or encoding) is not necessary to determine equivalence of text sequences.Implementation notes:
In the process of implementing further text normalization here, it was discovered that the underlying
simple-html-tokenizer
performs its own entities substitution when encountering text tokens in an HTML string. For the purposes of validation, this was considered to be redundant and was thus swapped with a stub entity parser in the included changes. Note that this is the change which enables the significant drop in bundle size. Note also as an aside that there's desire to consolidate to a single parser between the blocks parse and validator parse, so the use ofsimple-html-tokenizer
may or may not persist far into the future.Testing instructions:
Verify that block invalidation is not triggered by encoding variations.
For example, inserting the following HTML as the contents of a post (in Text Mode, Classic Editor Text tab, or directly in the database) should not be presented as an invalid block when next viewing the Visual Mode of the editor:
cc @MarkRH