This proposal has progressed to the Draft ECMAScript 6 Specification, which is available for review here: specification_drafts. Any new issues relating to them should be filed as bugs at http://bugs.ecmascript.org. The content on this page is for historic record only and may no longer reflect the current state of the feature described within.

EcmaScript Quasi-Literals

Motivation

EcmaScript is frequently used as a glue language for dealing with content specified in other languages : HTML, CSS, JSON, XML, etc. Libraries have implemented query languages and content generation schemes for most of these : CSS selectors, XPath, various templating schemes. These tend to suffer from interpretation overhead, or from injection vulnerabilities, or both.

This scheme extends EcmaScript syntax with syntactic sugar to allow libraries to provide DSLs that easily produce, query, and manipulate content from other languages that are immune or resistant to injection attacks such as XSS, SQL Injection, etc.

This scheme aims to preserve ES5 strict mode’s static analyzability while allowing details of the DSL implementation to be dynamic.

Try out the quasi-literal demo testbed.

Overview

Syntax

x`foo${bar}baz`

Syntactically, a quasi-literal is a function name (x) followed by zero or more characters enclosed in back quotes. The contents of the back quotes are grouped into literal sections (foo and baz) and substitutions (bar).

A substitution is an unescaped substitution start character ($) followed by either a valid Identifier or a curly bracket block. E.g., $foo or ${foo + bar}.

The literal sections are the runs of characters not contained in substitutions. They may be blank so the number of literal sections is always one greater than the number of substitutions.

Semantics

The semantics of quasi-literals are specified in terms of a desugaring which has the property that the free variables of the desugaring are the same as the union of the free variables of the substitutions and the function name.

Use Cases

This syntactic sugar will let library developers experiment with a wide range of language features.

Quasi-literals desugar a back quoted string to a function call that operates on the literal portions and substitution results.

E.g. quasiHandlerName`quasiLiteralPart1 ${quasiSubstitution} quasiLiteralPart2` desugars to something like

// hoisted declaration.
const callSiteId1234 = {
    raw: ['quasiLiteralPart1 ', ' quasiLiteralPart2'],
    cooked: ['quasiLiteralPart1 ', ' quasiLiteralPart2']
};

// in-situ
quasiHandlerName(callSiteId1234, quasiSubstitution)

The hoisted declaration at the top records the literal portions both before and after each EscapeSequence has been decoded and can be used as a key into a WeakMap. The in-situ function call applies the quasi handler to produce the result and receives the call site ID and the values of substiution.s

See the demo REPL for some runnable examples. Especially the drop-down at the top-right.

Secure Content Generation

safehtml`<a href="${url}?q=${query}" onclick=alert(${message}) style="color: ${color}">${message}</a>`

uses contextual auto-escaping to figure out that url and color should be filtered, query should be percent-encoded, and message HTML entity encoded to prevent XSS.

The syntax provides a clear distinction between trusted content such as <a href=” and substituted values that might be controlled by an attacker such as url. This prevents the problem that arise in other languages when format strings can be controlled by an attacker. Although EcmaScript’s memory abstractions are not vulnerable, it is very vulnerable to quoting confusion attacks and developers have trouble distinguishing content from an untrusted format string from that produced from a trusted one.

E.g.

url = "http://example.com/",
message = query = "Hello & Goodbye",
color = "red",
safehtml`<a href="${url}?q=${query}" onclick=alert(${message}) style="color: ${color}">${message}</a>`

produces

<a href="http://example.com/?q=Hello%20%26%20Goodbye"
 onclick=alert(&#39;Hello&#32;\x26&#32;Goodbye&#39;) style="color: red">Hello &amp; Goodbye</a>

but values are filtered so that if instead

url = "javascript:alert(1337)"
color = "expression(alert(1337))"

then the substitution holes are filled with innocuous values instead to produce:

<a href="#innocuous?q=Hello%20%26%20Goodbye"
 onclick=alert(&#39;Hello&#32;\x26&#32;Goodbye&#39;) style="color: innocuous">Hello &amp; Goodbye</a>

Similar schemes can work for securely composing URLs, JSON and XML data bundles, and for allowing composable SQL prepared statements.

Text L10N

msg`Welcome to ${siteName}, you are visitor number ${visitorNumber}!`

where visitorNumber should be formatted using locale-specific conventions, e.g. “1,000,000” in some parts of the world, and “1.000.000” in some others.

Message Extraction

Since there is a convenient simple format for human-readable messages, a static analyzer can easily find them (to substitute locale-specific versions) than if messages were simply the first argument to a function call.

For example, a static analyzer could find uses of msg`...` in source files to produce a message bundle like

<messagebundle>
  <message id="...">Welcome to {0}, you are visitor number {1}!</message>
</messagebundle>

Translators can then produce a message bundle with the translations.

Message Meta-data

Translators often need some context to help them translate human readable message strings.

Meta-data can be attached to comments the way other systems put type declarations and structured documentation in comments.

/**
 * @description Label text for a button that opens a window.
 */
myButton.innerText = msg`Open`;

but if the English word “Open” is used in two different forms (adjectival vs imperative), it may need to have two translations, so some L10N approaches would benefit from having disambiguation meta-data available at runtime.

There are two common ways of disambiguating:

  1. Associating the message with an identifier which is used as a message ID : #MSG_OPEN_BUTTON_TEXT { msg`Open` }
  2. Adding “meaning” meta-data to the message. myButton.innerText = msg`Open ; meaning=”Button text”`;

The latter convention makes the meta-data available not just to static analyzers, but also at runtime.

Substitution Meta-data

Meta-data can also be associated with substitutions in the same way.

/**
 * @param siteName The name of the site.  @example foo.ru
 * @param visitorNumber an integer. @example 1000000
 */
var message = msg`Welcome to ${siteName}, you are visitor number ${visitorNumber}:d!`;

The description of siteName and the @example meta-data can extracted along with the message and made available to translators but is not needed at runtime.

The :d meta-data is available at runtime to specify that the number should be presented as an integer. Never in scientific notation no matter how many billions of visitors foo.com receives.

Message replacement and substitution re-ordering

Once translators have delivered their translations, there are a number of ways to incorporate those.

If the locale is known statically, then msg`...` elements can be fully rewritten

// Before
alert(msg`Hello, ${world}!`);
 
// After
alert(msg`Bonjour ${world}!`);

If the locale is not known statically, then a source code rewriter can partially rewrite the message to a lookup into a side-table by message id.

// Before
alert(msg`Hello, ${world}!`);
 
// After
var messageBundle_fr = {
  MSG_1234: ['Bonjour ', 0 /* An index into substitutions */, '!']
};
 
alert(getMessage('MSG_1234', [world]));

The most natural order in which elements of a thought are expressed may differ between languages. msg`Welcome to ${siteName}, you are visitor number ${visitorNumber}:d!` might be translated into pig-latin as msg`Elcome-way isitor-vay umber-nay ${visitorNumber}:d to ${siteName}!`.

The index in the mesage bundle side-table above serve to identify the index of the substitution that fills that hole. For the pig-latin message above, the side-table would look like

var messageBundle_piglatin = {
  MSG_5678: ['Elcome-way isitor-vay umber-nay ', 1, ' oo-tay ', 0, '!']
};

Small projects that are not willing to introduce a source-code rewriting step just to get translation can do purely dynamic message replacement.

// Before
alert(msg`Hello, ${world}!`);
 
// After
var messageBundle_fr = {  // Maps message text and disambiguation meta-data to replacement.
  'Hello, {0}!': 'Bonjour {0}!'
};
 
alert(msg`Hello, ${world}!`);

where msg checks the side-table:

function msg(parts) {
  var key = ...;  // 'Hello, {0}!' given ['Hello, ', world, '!']
 
  var translation = myMessageBundle[key];
 
  return (translation || key).replace(/\{(\d+)\}/g, function (_, index) {
      // not shown: proper formatting of substitutions
      return parts[(index << 1) | 1];
    });
}

Specifying a locale

EcmaScript applications running in the browser typically deal with only one user, hence operate in only one locale. But EcmaScript on the server does not, and there are exceptions in browser-based EcmaScript apps.

It is possible to specify a locale for a scope so that a particular message bundle is used for message replacement, and so that locale is used for formatting numbers and dates.

let lmsg = msg.withLocale(messageRecipientLocale);
sendRecommendation(msg`Your friend ${friendName} thinks you would like to read "${articleTitle}".`);

Security

Generating human readable strings often requires combining data from other users to produce human readable strings of HTML. As such, it is a prime vector for XSS attacks.

It is possible to compose the L10N use case described in this with the secure content generation scheme so there is no need to choose between localizability and security.

function msg(callSiteId /* ...substitution values */) {
  var metaData = extractMetaDataFromLiteralParts(callSiteId.raw);
  replaceLiteralPartsWithLocaleSpecificLiteralParts(metadata);
  reorderAndFormatSubstitutions(arguments, metaData);
  return interleaveAndJoin(metadata.literalPortions, arguments);
}
 
function safehtml(callSiteId /* ...substitution values */) {
  var sanitizers = chooseEscapingFunctionsBasedOnLiteralParts(callSiteId.raw);
  applySanitizersToSubstitutions(sanitizers, arguments);
  return interleaveAndJoin(metadata.literalPortions, arguments);
}
 
// The composition
function safehtml_msg(parts) {
  var metaData = extractMetaDataFromLiteralParts(callSiteId.raw);
  replaceLiteralPartsWithLocaleSpecificLiteralParts(metaData);
  reorderAndFormatSubstitutions(arguments, metaData);
  var sanitizers = chooseEscapingFunctionsBasedOnLiteralParts(metadata.literalPortions);
  applySanitizersToSubstitutions(sanitizers, arguments);
  return interleaveAndJoin(metadata.literalPortions, arguments);
}

Query Languages

$`a.${className}[href=~'//${domain}/']`

might specify a DOM query for all <a> elements with the given class name and that link to URLs with the given domain.

The className and domain do not need to be encoded then decoded by a query-engine so mis-encodings can be eliminated as a class of bugs and source of inefficiency.

Message Sends

Message sends can be specified using a syntax that looks like an HTTP request.

GET`http://example.org/service?a=${a}&b=${b}
    Content-Type: application/json
    X-Credentials: ${credentials}

    { "foo": ${foo}, "bar": ${bar} }`(myOnReadyStateChangeHandler);

might configure an XMLHttpRequest object to the specified (securely composed) URL with the given (securely composed) headers, and after the end of the headers could switch to context-sensitive composition based on the content-type header : JSON in this case, or an XML message in another case.

Flexible Literal Syntax

Often, developers use the new RegExp(...) constructor because they want a tiny part of their regular expression to be dynamic, and fail to properly escape character classes such as “\s”, and regular expression special characters such as ..

A quasi syntax for regular expression construction

re`\d+(${localeSpecificDecimalPoint}\d+)?`

gets the benefit of the literal syntax with dynamism where needed.

Raw Strings

Python raw strings are trivial since quasi handler functions receive the raw text of literal portions.

raw`In JavaScript '\n' is a line-feed.`

Decomposition Patterns

If the alternative quasis-substitutions-slot desugaring is used (see SubstitutionBody) then a pattern decomposition handler re_match invoked thus

if (re_match`foo (${=x}\d+) bar`(myString)) {
  ...
}

could use assignable substitutions to achieve the same effect as

{
  let match = myString.match(/foo (\d+) bar/);
  if (match) {
    x = match[1];
    ...
  }
}

Logging

warn`Bad result $result from $source`

can provide console.log(”o=%s”, o) style logging of structured data without the need for positional parameters.

Syntax (normative)

Literal Portion Syntax

This defines the top quasi literal production and explains how the boundaries between literal portions and substitutions are determined.

QuasiLiteral ::

LiteralPortion ::

LiteralCharacter ::

  • SourceCharacter but not back quote ` or LineTerminator or back slash \ or dollar-sign $
  • LineTerminatorSequence
  • LineContinuation
  • \ EscapeSequence
  • $ lookahead ∉ {, IdentifierStart

QuasiLiteralTail ::

Substitution ::

Literal Portion Array

The LPA operator defines an array of strings derived from the raw text of the literal portions of the quasi.

E.g. the LPA for the quasi q`foo${bar}baz` is [’foo’, ‘baz’].

Production Result
QuasiLiteral :: QuasiTag`LiteralPortion QuasiLiteralTail array-concat(single-element-array(LPA(LiteralPortion)), LPA(QuasiLiteralTail))
QuasiLiteralTail :: Substitution LiteralPortion QuasiLiteralTail array-concat(single-element-array(LPA(LiteralPortion)), LPA(QuasiLiteralTail))
QuasiLiteralTail :: ` an empty array
LiteralPortion :: LiteralCharacter LiteralPortion string-concat(LPA(LiteralCharacter), LPA(LiteralPortion))
LiteralPortion :: ε the empty string
LiteralCharacter :: SourceCharacter single character string containing that character.
LiteralCharacter :: LineTerminatorSequence the literal text
LiteralCharacter :: LineContinuation str-concat(“\\”, LPE(LineTerminatorSequence))
LiteralCharacter :: \EscapeSequence str-concat(“\\”, EscapeSequence)
LiteralCharacter :: $“$”

QuasiTag

Before the open backquote (`) there is an optional expression that specifies a function that receives the literal portions and substitutions.

quasis-quasitag-memberexpr is another way to define the QuasiTag production that allows for arbitrary member expressions. If worthwhile, it should be adopted instead of this section.

QuasiTag ::

  • Identifier
  • ε

QT

Production Result
QuasiTag :: Identifier an expression of the form PrimaryExpression : Identifier with the given Identifier
QuasiTag :: ε the Default Quasi Tag function below

Default Quasi Tag

The default quasi tag is a frozen function defined as

  // callSiteId : ignored
  // mixedLiteralPortionsAndSubstitutions :
  //   An odd-length array where even elements (0-indexed) are
  function (callSiteId /* , ...substitutions */) {
    var rawStrs = callSiteId.expandedLP;
    var out = [];
    var i = 0, k = -1, n = rawStrs.length - 1;
    while (i < n) {
      out[++k] = rawStrs[i];
      out[++k] = arguments[++i];
    }
    out[++k] = rawStrs[n];
    // As per the original Array.prototype.slice and Array.prototype.join.
    return out.join("");
  }

Substitution Body Syntax

Between literal portions there are substitutions of the form ${...} or $ident. The substitution body specifies an expression, e.g. the substitution bodies in quasitag`literal0 ${x.y} literal1 $bar literal2` are (x.y), and (bar).

E.g. the SVE of quasitag`literalPortion0 $x literalPortion1 ${y.z} literalPortion2` is [x, (y.z)].

Below are other ways of defining the SubstitutionBody production and the SVE spec function. If preferred, they should be used instead of this section.

  • quasis-substitutions-simple-members - more complex expressions, but easily lexically boundable.
  • quasis-substitutions-primaryexpr - arbitrary expressions are allowed.
  • quasis-substitutions-thunk - arbitrary expressions are allowed, and the expressions are thunkified so that a quasi handler (QT) may evaluate them zero or multiple time to support branching or looping.
  • quasis-substitutions-slot - arbitrary expressions are allowed, and expressions that are preceded by the modifier = may be used as left hand sides, assigned to by the quasi handler. This enables use cases like the destructuring regular expression match.

//IdentifierPathTail// ::

  • . IdentifierName IdentifierPathTail
  • ε

//SubstitutionBody// ::

  • Identifier IdentifierPathTail

//SubstitutionModifier// ::

  • ε

SVE

Production Result
QuasiLiteral :: QuasiTag`LiteralPortion QuasiLiteralTail SVE(QuasiLiteralTail)
QuasiLiteralTail :: Substitution LiteralPortion QuasiLiteralTail array-concat(single-element-array(SVE(Substitution)), SVE(QuasiLiteralTail))
QuasiLiteralTail :: ` an empty array
Substitution :: $IdentifierPrimaryExpression : Identifier
Substitution :: ${SubstitutionModifier Identifier}PrimaryExpression
SubstitutionBody :: Identifier IdentifierPathTailMemberExpression : str-concat(SV(Identifier), SV(IdentifierPathTail))
IdentifierPathTail :: .IdentifierName IdentifierPathTail str-concat(“.”, SV(IdentifierName), SV(IdentifierPathTail))
IdentifierPathTail :: ε the empty string, ““

The SVE of a substitution is an expression that is evaluated in the scope in which the quasiliteral appears. The SVE of the quasi literal is the array of the SVE for each substitution.

E.g. the SVE of quasitag`literalPortion0 $x literalPortion1 $y.z literalPortion2` is [x, y.z].

Tokenizing

The grammar defined above is lexically simple. There are a number of alternate schemes for specifying the SubstitutionBody production. Allowing it to be an arbitrary expression raises a few wrinkles – quasiliterals can nest, and expressions containing curly brackets can nest inside ${...} sections, so finding out where a quasiliteral substitution body ends requires changes to the way EcmaScript is tokenized.

The SubstitutionBody production is most simply defined in terms of PrimaryExpression which is not a lexical production. The QuasiLiteral grammar defined below can be described in terms of lexical productions without more trickery than that needed to treat tokenize lexical productions.

When lexing, a stack of bits is needed to determine whether a curly bracket ends a substitution, and a single bit is needed to tell whether the parser is currently inside a LiteralPortion.

See the interactive demo of the tokenization scheme described below for more detailed discussion.

Curly Bracket Stack In Literal Portion Character Seen Effect
any true ${ emit the LP, inLP := false, push true
any true ` emit the LP, inLP := false
any true \ consume next character, append both to LP
any true other append to LP
any false ` inLP := true
any false { push false, emit {
top is false false } pop, emit }
top is true false } pop, inLP := true
empty false } no effect
any false neither { or } tokenize as if quasis not in grammar

This requires a lookahead of 1 because ${ has to be treated as a unit for purposes of determining whether a $ starts a substitution.

Semantics (normative)

Given the QT, LPA, and SVE defined above, this specifies the desugaring of the QuasiLiteral production.

This version passes all the parts to the function specified by the quasi tag in one argument list instead of passing the literal portions first.

CallSiteId

The desugaring below uses a hoisted declaration instead of interleaving the LPA with the SVE. This is done for a variety of reasons.

  1. Many quasi handlers want to deal with literal portions as if they were chunks of a StringLiteral where each EscapeSequence is replaced with its corresponding character value. This is typical for quasi handlers that produce a string like result.
  2. Some use-cases benefit from having access to the raw text of the literal portion. E.g. \b has a different meaning in a regular expression than in a StringLiteral, and knowing whether a meta-character is escaped can be beneficial.
  3. Many quasi handler functions benefit from being able to memoize their results. Having an object that identifiers a call site and which can serve as a key into a WeakMap allows easy memoization.

Desugaring

A QuasiLiteral in an EcmaScript parse tree is desugared to two subtrees. It is replaced in-situ with a function call that produces the value of the quasiliteral, and a declaration is hoisted into the top module scope that allows a quasi-tag easy access to the raw literal portions and allows a convenient handle by which a quasi handler can memoize state derived from the literal portions (extracted meta-data and the like). This desugaring is specified in terms of a hoisted declaration for the convenience of tool authors who want to back-port this desugaring, but in interpreters, there is no need to create a record in a lexical environment or assign an actual name – the interpreter need only ensure that when a particular quasiliteral is evaluated, the same call site object is passed to the quasi handler.

The declaration has a name, CallSiteID, that is an unguessable Identifier. It has the form (ConstDeclaration, CallSiteID, (CallExpression, (MemberExpression, (Identifier, “Object”), “.”, (Identifier, “freeze”)), (ParameterList, (ObjectLiteral, (ObjectProperty, (StringLiteral, “raw”), (CallExpression, (MemberExpression, (Identifier, “Object”), “.”, (Identifier, “freeze”)), (ParameterList, (ArrayLiteral, ...LPA)))), (ObjectProperty, (StringLiteral, “cooked”), (CallExpression, (MemberExpression, (Identifier, “Object”), “.”, (Identifier, “freeze”)), (ParameterList, (ArrayLiteral, ...cookedLPA)))))))).

The cookedLPA used in the declaration by mapping the StringLiteral SV function over the LPA. This has the effect of decoding each EscapeSequence.

In-situ, the QuasiLiteral is replaced with a function call: (CallExpression, (QT, (ParameterList, (array-concat([CallSiteID], SVE)))).

Since for quasiTag`literalPortion\0 $x literalPortion1` the QT is quasiTag, LPA is [”literalPortion\\0 “, " literalPortion1”] and SVE is [x], if the assigned CallSiteId is unguessableCallSiteId1234 then it desugars to

// Declaration hoisted to top of module.
// Escape sequences in the hoisted declaration are not decoded according to CV(EscapeSequence).
// The calls to Object.freeze use the original definition of that method.
const unguessableCallSiteId1234 = Object.freeze({
  raw: Object.freeze(["literalPortion\\0 ", "literalPortion1"]),
  cooked: Object.freeze(["literalPortion\u0000 ", "literalPortion1"])
});
 
...
 
  // In-situ
  // Escape sequences in the arguments are decoded.
  // unguessableCallSiteId1234 is ideal as a key into a weak map used to memoize
  // extraction of meta-data, custom escaping conventions, and other state
  // that can be derived from the literal portions, so does not vary from
  // call to call from the same call-site.
  quasiTag(unguessableCallSiteId1234, x)

Security Considerations

This strawman should also fall in the language subset defined by SES (Secure EcmaScript). As such, neither its presence in the language nor its use in a program should make it substantially more difficult to reason about the security properties of that program.

Developers expect that object references only escape a scope by being explicitly passed or assigned. This strawman needs to preserve both the scope invariants of EcmaScript 5 functions and catch blocks, and those introduced by the modules and let proposals.

The below discusses the interaction between a quasi function defined in one scope/module and the code it produces to be executed in another scope/module. The actors include

  • library author – the author of the module / scope in which the quasi function is defined
  • quasi author – the author of the quasi-literal and any symbols defined in the module / scope containing it.

Defensive Code

A module needs to be able to defend its invariants against bugs or deliberate malice by another module. SES does not attempt to guarantee availability since trivial programs can loop infinitely, but a module must be able to guarantee that its invariants hold when control leaves it.

This proposal does not complicate defensive code reasoning because:

  • only symbols mentioned in a substitution are observable by the library author
  • only symbols marked as writable can be written by the library author

The quasi author has to be aware that the order of evaluation is unclear. For quasis to specify new control constructs, substitutions need to be evaluable out of order, repeatedly, or not at all.

Under the thunking and slot alternative SubstitutionBody desugarings, by writing a substitution, the quasi author is conveying the authority to evaluate an expression in the quasi scope any number of times from that point on. (Assuming the quasi module has the authority to cause delayed evaluation as by setTimeout). A substitution conveys the same authority as a zero argument function.

Offensive Code

The library author’s quasi function may be used by multiple mutually suspicious or intentionally isolated modules. It can ensure that bugs or malice in one module do not affect its ability to serve another module by freezing the symbols it exports and by coding defensively.

This proposal does not complicate its ability to do that, since it imposes no mutable data requirements on quasi functions.

Possible Problems

This syntax is, by design, similar to that of string interpolation in other languages. Users may assume the result of the quasi-literal is a string as occurs in languages like Perl and PHP (3), and that subsequent mutations to values substituted in do not affect the result of the interpolation. It is the responsibility of QT implementers to match these expectations or to educate users. Specifically, developer surprise might result from the below if q kept a reference to the mutable fib array which is modified by subsequent iterations of the loop.

var quasis = [];
var fib = [1, 1];  // State shared across loop bodies
for (var i = 1; i < 10; ++i) {
  fib[1] += fib[0];
  fib[0] = fib[1] - fib[0];
  quasis.push(q`Fib${i-1} and fib${i} are $fib`);
}

String interpolation in other languages is often a vector for quoting confusion attacks : XSL, SQL Injection, Script Injection, etc.. It is the responsibility of QT implementers to properly escape substituted values, and a lazy escaping scheme (2) can provide an intelligent default. It is a goal of the proposed scheme to reduce the overall vulnerability of EcmaScript applications to quoting confusion by making it easy for developers to generate properly escaped strings in other languages.

Quasi-literals contain embedded expressions, but the set of lexical bindings accessible to the quasi handler is restricted to the union of the below so they do not complicate static analysis

  1. the set of identifiers mentioned by the author in the lexical environment in which the quasi-literal appears,
  2. the lexical environment of the QT in the environment in which it is defined,
  3. for QTs defined in non-strict mode, the global object as bound to this.

Reasons and Open Issues

Quoting Character

The meaning of existing programs should not change, so this proposal must extend the grammar without introducing ambiguity. It is meant to enable secure string interpolation and DSLs, so using a syntax reminiscent of strings seems reasonable, and many widely used languages have string interpolation schemes which will reduce the learning curve associated with the proposed feature.

Backquote (`) was chosen as the quoting character for string interpolations because it is unused outside string and comment bodies; and is obviously a quoting character.

It is already used in other languages that many EcmaScript authors use – perl, PHP, and ruby where it allows interpolation though with a more specific meaning than macro expansion. It is used as a macro construct in Scheme where it is called a “quasiquote.” In Python 2.x and earlier, it is a shorthand for the repr function, so contained an expression and applied a specific transformation to it.

As such, many syntax highlighters deal with it reasonably well, and programmers are used to seeing it as a quote character instead of as a grave accent.

Alternatives include:

  • q"""Interpolate ${this}!"""

    .

  • q"Interpolate ${this}!"

    which simply uses an existing quoting character but which constrains the default quasi handler definition.

  • q{{"Interpolate ${this}!"}}

    which simplifies nesting.

  • q(:"Interpolate ${this}!":)

    which is friendly even if not RSI friendly.

  • q@"Interpolate ${this}!"

    is similar to C# literal strings but different semantically.

Nesting

There are a number of advantages to allowing quasis to nest. Substitutions are easy to understand if they are just expressions, and quasis are just another kind of expression.

There are concrete use cases as well. We could integrate control flow into the safehtml quasi handler available at the REPL, but substitutions with nested quasis can serve just as well.

rows = [['Unicorns', 'Sunbeams', 'Puppies'], ['<3', '<3', '<3']],
safehtml`<table>${
  rows.map(function(row) {
    return safehtml`<tr>${
      row.map(function(cell) {
        return safehtml`<td>${cell}</td>`
      })
    }</tr>`
  })
}</table>`

produces something like

<table>
  <tr><td>Unicorns</td><td>Sunbeams</td><td>Puppies</td></tr>
  <tr><td>&lt;3</td><td>&lt;3</td><td>&lt;3</td></tr>
</table>

Substitutions

Since we’re choosing syntax to reduce the learning curve, we chose ${...} since it is used to allow arbitrary embedded expressions in PHP and JQuery templates. We also include the abbreviated form ($ident) to be compatible with Bash, Perl, PHP, Ruby, etc.

We decided against sprintf style formatting, since, although widely understood, it does not allow many DSL applications, and imposes an O(n) cognitive load (2).

Alternatives include:

  • Bash: $(...)
  • Ruby: #{...}

Raw Escapes in Literal Sections

The per-call-site object passed as the first argument to the quasi handler function contains both the raw text of each literal portion and contains the text of literal portion after escape sequences have been converted into the characters they would specify inside a StringLiteral.

We lose no generality by providing raw escapes and there are use cases where raw escapes are useful, as in regular expression composition:

var my regexp = re`(?i:\w+$foo\w+)`;
 
function re(literalPortions) {
  for (var i = arguments.length; --i >= 0;) {
    literalPortions[i * 2] = arguments[i];
  }
  return function (substitutions) {
    var regexBody = literalPortions.slice(0);
    for (var i = 0, n = substitutions.length; i < n; ++i) {
      var sub = substitutions[i]
      regexBody[i * 2 + 1] = sub().replace(
          /[\\(){}\[\]^$.+*?|\-]/g, '\\$&');
    }
    return new RegExp(regexBody.join(''));
  };
}

Line Continuation and Line Terminators

Some revision control systems rewrite newlines on checkout. These systems might change the meaning of EcmaScript programs that contain multi-line quasis.

PHP, Python, Perl and other languages are already sensitive to white-space characters at the end of lines so developers are already familiar with these problems.

Newlines and trailing spaces inside quasi-literals are significant.

References

Quasis in E

Secure String Interp

PHP String Vars

PLT Scheme Scribble

SML of New Jersey

SML/NJ has a similar Quote/Antiquote feature (whose documentation, ironically enough, has an HTML bug in a quoted code snippet, resulting in the bottom third or so of the page being in monospaced font).

Secure Code Generation

Scheme Hygienic Macros

Paradigm Regained

Safe Templates

 
harmony/quasis.txt · Last modified: 2013/07/11 23:58 by rwaldron
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki