This proposal removes assumptions about Unicode normalization that aren’t valid for current implementations from the ECMAScript Language Specification, and adds a simple normalization method.
The ECMAScript Language Specification 5.1 contains assumptions about the normalization of source text in several sections:
As Rich Gillam reports from the TC 39 discussions: “The expectation is that there’s a separate layer of some kind around the environment of a running ECMAScript program that ensures text coming in from outside is normalized.” and “Applications that get text from the Internet can depend on the source to follow the W3C rules.” The W3C rules referred to here are presumably the ones about early normalization in the Character Model for the World Wide Web (1999 draft, 2012 draft).
In reality, unnormalized source text is not normalized anywhere on the way into the ECMAScript runtimes of any of the top 5 browsers or in Node.js. It is also not normalized when a string is passed to eval(). (Tested with: Safari 5.1.7 Mac, iOS 5.1.1; Firefox 12.0 Mac, Win; Chrome 19.0.1084.52 Mac, 19.0.1084.46 Win; Opera 11.64 Mac, Win; IE 9.0.8112.16421 Win; Node.js 0.6.18 Mac). On the web, early normalization hasn’t become the norm either; the latest draft of the Character Model indicates “the Internationalization Core Working Group’s intention to substantially alter or replace the recommendations found here with very different recommendations in the near future.”
Implementing normalization of source text in ECMAScript implementations now would create the risk of breaking existing code that relies on identifiers or strings being different that would be equal after normalization.
I propose that instead the invalid assumptions are removed from the specification:
Text comparison operations generally are easier to implement and produce better results if they can operate on normalized text. Implementations of the Intl.Collator object in the ECMAScript Internationalization API will typically apply normalization where needed, e.g., for Arabic and Vietnamese, where unnormalized text is common. For other operations, such as the simple string comparison used by the operators <, ==, etc. as well as regular expressions, applications should be able to normalize strings beforehand.
I propose adding the following function:
String.prototype.normalize(form)
NOTE The normalize function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.