Skip to content
This repository was archived by the owner on Jan 26, 2022. It is now read-only.

tc39/proposal-well-formed-stringify

Repository files navigation

Image for: Repository files navigation

Well-formed JSON.stringify

A proposal to prevent JSON.stringify from returning ill-formed Unicode strings.

Status

Image for: Status

This proposal is at stage 4 of the TC39 Process.

Champions

Image for: Champions
  • Mathias Bynens

Motivation

Image for: Motivation

RFC 8259 section 8.1 requires JSON text exchanged outside the scope of a closed ecosystem to be encoded using UTF-8, but JSON.stringify can return strings including code points that have no representation in UTF-8 (specifically, surrogate code points U+D800 through U+DFFF). And contrary to the description of JSON.stringify, such strings are not "in UTF-16" because "isolated UTF-16 code units in the range D800₁₆..DFFF₁₆ are ill-formed" per The Unicode Standard, Version 10.0.0, Section 3.4 at definition D91 and excluded from being "in UTF-16" per definition D89.

However, returning such invalid Unicode strings is unnecessary, because JSON strings can include Unicode escape sequences.

Proposed Solution

Image for: Proposed Solution

Rather than return unpaired surrogate code points as single UTF-16 code units, represent them with JSON escape sequences.

Illustrative examples

Image for: Illustrative examples
// Non-BMP characters still serialize to surrogate pairs.
JSON.stringify('πŒ†')
// β†’ '"πŒ†"'
JSON.stringify('\uD834\uDF06')
// β†’ '"πŒ†"'

// Unpaired surrogate code units will serialize to escape sequences.
JSON.stringify('\uDF06\uD834')
// β†’ '"\\udf06\\ud834"'
JSON.stringify('\uDEAD')
// β†’ '"\\udead"'

Discussion

Image for: Discussion

Backwards Compatibility

This change is backwards-compatible, under an assumption of consumer compliance with the JSON specification. User-visible effects will be limited to the replacement of some rare single UTF-16 code units in JSON.stringify output with equivalent six-character escape sequences that can be represented both in UTF-16 and in UTF-8. It is the authors' opinion that any consumer accepting the current ill-formed output will be unaffected by this change (this is true in particular of ECMAScript JSON.parse). Any consumer rejecting the current ill-formed output will have a new opportunity to accept its well-formed representation, although such consumers may still reject input that specifies strings including Unicode code points that are not scalar values (e.g., because they only accept I-JSON input), but those that accept it must have mechanisms for dealing with unpaired surrogates (as mentioned in the specification of JSON).

Validity

Unicode escape sequences are valid JSON, andβ€”being completely ASCIIβ€”are well-formed in both UTF-16 and UTF-8.

Specification

Image for: Specification

The specification is available in ecmarkup or rendered HTML.

Implementations

Image for: Implementations

About

Image for: About

Proposal to prevent JSON.stringify from returning ill-formed strings

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Contributors 6

Image for: Contributors 6