This is the discussion page for destructuring_assignment.

Generalized destructuring binding

(Moved into the proposal. — Lars T Hansen 2006/06/05 00:12)

My comment in the proposal

I commented under “For in statement” in the destructuring assignment about how we need to specify the length of array patterns independent of the number of identifiers.

Brendan Eich 2006/10/19 07:50

Re: recent addition of parentheses

Did I miss something? There is no syntactic ambiguity with [a,b] = foo() that requires parens around the lhs. People who are inclined to leave semicolons off get bitten occasionally, but that seems like no reason.

Lars T Hansen 2006/10/17 11:56

Maybe I’m missing something? See comments from Brendan and Lars under “Getting rid of &” .... specifically, I’m strongly in favor of eliminating & and requiring parentheses for object destructuring without let or var. It looked to me like there was consensus that this was the required syntax.

Steven Johnson 2006/10/18 07:05

Hm. I think we got rid of the & prefix in May or so. The consequence is that you need parens around destructuring object literals in statement contexts, since they will otherwise be interpreted as block statements, but parens are not needed around array destructurers in any context or object destructurers anywhere else. So the requirement for parens in that one case is (to my mind) implicit in the syntax of the language, though for all I know it may need to be expressed in the grammar somehow for the grammar to be nonambiguous.

Lars T Hansen 2006/10/19 03:37

This discussion seems to be connected to Steven’s original comment about swap being counter-intuitive. First, I would say adding back & to the syntax does not obviously address certain intuitions about whether [a, b] = [b, a] swap. Second, the “swap intuitiveness” hangs in this case on the “macro-like” use to describe expansion. If we drop the “macro” word, I think we can make the meaning clear. Intuitions vary, but tuple assignments using temporaries and allowing parallelism are common in Python, Perl, etc.

Brendan Eich 2006/10/19 07:47

Older issues

Regarding the ''for-in'' statement

<lhs> could begin with var or let, and it might be a <pattern>, but it could be any other left-hand side expression, including a[b].c, ns::ln, or even * (the last is courtesy E4X).

Brendan Eich 2006/04/03 19:18

Fixed now.

Lars T Hansen 2006/04/07 03:58

Should we require { var | let | const } before the destructuring assignment expression in a for-in or for-each-in statement? E3 does for assignment expressions in that same position.

Jeff Dyer 2006/05/05 10:51

ES3 does not require var, it allows it, in the head of a for loop. But this section of comments is old, and Lars updated the spec to include <var-keyword>? at the front of the loop head a while ago. I’m not sure const should be allowed, since it makes for an odd way to initialize a const conditionally if the loop iterates 0 or 1 times, and an error (in our revised-for-ES4 view of ReadOnly property assignment semantics) if the loop iterates more than once.

Brendan Eich 2006/05/05 18:39

Actually assignment expressions are not allowed before in in a for-in statement in ES3. Spidermonkey seems to agree:

js> for( x=10 in o ) ;
typein:1: SyntaxError: invalid for/in left-hand side:
typein:1: for( x=10 in o ) ;
typein:1: ..........^

Should we allow a bare destructuring assignment expression there or should we require some form of variable definition like ES3? I agree that const doesn’t make sense in for-in and for-each-in.

Jeff Dyer 2006/05/10 17:22

Oh, sorry – I misread what you wrote at 2006/05/05 10:51. Yes, initialized var and let with a single name (no comma-separated list of possibly initialized variable names) makes sense and is backward compatible. It makes (some) sense because it allows an initial value to be given to the loop variable, and if the loop iterates zero times, that will be the final value.

Agreed that we don’t need to support const, but it would be simpler to allow it too. I do not feel strongly either way at the moment.

Brendan Eich 2006/05/11 04:43

Thanks both. Fixed now.

I have not added const or let const. Whereas const probably makes things more complicated (one can only have zero-trip loops if expr is present and one-trip loops if it is not), let const makes a fair amount of sense if you consider that fresh bindings are created for each iteration. (Based on earlier discussions I think we are agreed that let bindings should be fresh for each iteration, but the spec for let does not state that. I will amend the block expressions spec, please pick up the discussion there if it is controversial.)

Lars T Hansen 2006/05/12 03:58

Regarding the ''for-each'' statement

Here and in the “For-in statements” section just above it, the nonterminal (if that’s what the name in <> is) must include &, so be a <pattern>, not just an <lhs>. But we also want to allow optional var or let before the &. Sorry, just picking nits so this gets specified fully.

Brendan Eich 2006/04/03 19:15

Fixed now.

Lars T Hansen 2006/04/07 03:58

Contrast to normal assignment

Do we care that

  [a[i], b[j]] = [i++, j++]

increments i and j before computing the lvalues? ECMA-262’s assignment spec takes pains to bind the lvalue before evaluating the right-hand side of the assignment operator.

The semantics proposed are nice and simple. Binding lvalues before evaluating the right part would require a separate walk over the left-hand side. Yuck.

Brendan Eich 2006/05/04 21:23

This is only an issue for pure assignments, of course, not for any of the binding+destructuring constructs.

I guess my inclination is to say that destructuring is really just a macro expansion (it is not a primitive form), and that it is therefore reasonable to keep the semantics as they are. I don’t know if that argument will fly, since the built-in syntax makes it look primitive, but it sure makes it easier to explain easily what it is that it does.

Lars T Hansen 2006/05/05 01:17

I’m inclined the same way: we should not require lvalues to be bound before rvalues are computed, then assigned after rvalues have been evaluated. However, could the spec be written to require only one left-to-right traversal of each level of a destructuring initialiser, so that all of the lvalues need not be assigned before any nested destructuring initialisers? Or was that two-pass traversal intended for some reason?

If only to match the left-to-right, depth-first traversal done when evaluating an object initialiser rvalue, I think the lvalue case should go left-to-right, depth-first.

Brendan Eich 2006/05/08 23:37

The two passes were accidental, in fact I’ve not implemented it like that, but as a single pass. I agree that the spec seems buggy in that regard. I’ll try to fix it.

Lars T Hansen 2006/05/09 12:45

In fact, the swap example in the spec would never work right if it is a simple “macro expansion”:

   [a, b] = [b, a]

would presumably expand into something like

    b = a;
    a = b;

which is definitely counter-intuitive... we should probably explicitly specify & require the behavior here.

Steven Johnson 2006/10/03 17:54

Comparisons with "group assignment"

The proposal has a few advantages over the group assignment proposal:

  • No new data type, we use arrays and objects we already have
  • No need for operations on that new data type, eg “capture this tuple as an array”

There are some disadvantages, too:

  • The implementation must be a little smarter to handle multiple return values, and the programmer must know when she can expect the implementation to be smart
  • Overloading of array onto multiple return values weakens the type discipline of the language somewhat and is less suitable for programming in the large

Syntactic issues

There are several drawbacks to mirroring the syntax of destructuring patterns and object/array literals.

The grammar becomes ambiguous (object literals and block statements), so the prefix & or another prefix is necessary to disambiguate when the parser is looking for a statement. The alternative is a backtracking or multi-way parser, as no fixed amount of lookahead solves the problem, and any lookahead will need to do some parsing.

For simple destructurings the mirroring in syntax between the destructuring pattern and object literals is probably of minor importance, and in fact makes the syntax seem forced. Better might be this:

   var a, b;
   &{ a = fisk, b = fnys } = expr

which assigns expr.fisk to a and expr.fnys to b.

For more complex destructurings the mirroring pays off better, and for array patterns the proposed syntax feels entirely natural.

Another problem with the mirrored syntax is that it provides no way of capturing a reference to a substructure while at the same capturing the values of that substructure. Consider destructuring an AST:

   &{ op: a, lhs: b, rhs: c } = getASTNode()
   &{ op: lhsop } = b

One could have wished for a way of merging these, like this:

   &{ op: a, lhs: b { op: lhsop }, rhs: c } = getASTNode()

This can be made to work, I think, if it seems worth the bother, though at this point the syntax does not mirror the syntax of object constructors.


I would not bother, but if we decide to allow matching parent and children, we could even allow that at the outermost level. This suggests restoring symmetry with object initialisers:

   var construct = {a: "foo", b: {c: "bar", d: "baz"}}
   var destruct, da, db, dc, dd
   destruct = &{a: da, b: db = {c: dc, d: dd}} = construct

This nests an assignment expression, db = {c: dc, d: dd}, where it is perfectly legal in the corresponding position in the initialiser. For the destructuring form, we would want to extend the field production:

   field   ::= ident ":" (lhs | lvalue | ident "=" lhs)

This example also shows that destructuring assignment, as with other kinds, evaluates to the value of the right-hand side.

Brendan Eich 2006/04/03 18:46

Getting rid of "&"

On reflection we can get rid of “&”, though the cure may be as bad as the disease. The “&” prefix is necessary only because object literals cannot occur in a statement position, so cannot be used in a simple assignment statement. In every other context, it is redundant. When all we had was destructuring assignments this was a big deal, but I’m guessing that destructuring bindings will be popular, so pure object destructuring assignments may not occur that often. That being the case, we can use parentheses to disambiguate syntactically: what used to be

  &{ name: n, address: a } = getEmployee();

would be written

  ({ name: n, address: a } = getEmployee());

(I would have liked to use void but it has the wrong precedence.)

There are a couple reasons to want to get rid of “&”. First, we don’t want ECMAScript to become Perl. Second, I was explaining destructuring to a colleague and he thought the “&” introduced some strange reference type that was passed to the value producer and filled in with values. People coming from C/C++ may tend to think such things. It is confusing.

Lars T Hansen 2006/04/23 06:40

Agreed about &. The first thing that comes to my mind is that you don’t have to use a one-character operator. How about:

  match({ name: n, address: a }) = getEmployee();

or

  match { name: n, address: a } = getEmployee();

Maybe too heavy-weight when all you want is multiple-value return.

Note that you can also just parenthesize the lhs, no?

  ({ name: n, address: a }) = getEmployee();

Another not-so-pretty possibility: since the only culprit is {, special-case the syntax to use [..] instead of {..} for object matching:

  [ name: n, address: a ] = getEmployee();

I know, inconsistent. Just brain-storming. Yay syntax.

Dave Herman 2006/04/24 10:02

match does not work, this is not pattern matching. Something like please do would work though. :-P

I like the observation about using parens around the lhs alone; ECMAScript 3 explicitly allows lvalues to be parenthesized.

Using [..] instead of {..} really defeats the purpose of having the same syntax for structuring, destructuring, and typing, so I don’t think that’s going anywhere.

Anyhow, it looks like we can remove & and switch to parentheses. But is it an improvement? Is it not nice that & is a syntactic marker for the destructurer?

Lars T Hansen 2006/04/26 04:19

I’m not sure & is a nice marker, period ;-).

If we believe most destructuring binds will start with let (or, shudder, var), we could dispense with & in these cases. Then it would be required only for destructuring into already-bound variables (or, horrors, into unintentional globals). In that case, would we really want &, or would saying “use parentheses” be better? We could avoid new syntax, and reuse existing disambiguation know-how among programmers.

I’m not thrilled about either approach here, but I am leaning toward minimizing new and Perl-ish syntax, and reusing plain initialiser syntax as much as possible (even if it means parentheses in the undeclared object destructuring case – a case I hope will be rare and therefore “hard”, as in “hard cases make bad law”).

Update: I’m strongly in favor of eliminating & and requiring parentheses for object destructuring without let or var. If mirroring object and array initialiser syntax is good, avoiding different punctuation is better; if let is the new var, and destructuring should use structural type annotations, then there is even less need for &.

Brendan Eich 2006/05/02 23:34

Going once, going twice ... does anyone object to this compromise? If not, consider it a done deal; I will update the spec when I have a minute.

Lars T Hansen 2006/05/03 00:09

& removed from spec, updated my test implementation. Everything seems to be fine.

Lars T Hansen 2006/05/03 00:58

Type checking

Type checking for multiple return values is somewhat impaired as compared with the group assignment proposal: An Array’s type does not include its length, so programs that use arrays to implement multiple return values won’t be checked to see whether every function returns the number of values expected by the caller.

The returned Array will in the general case be an Array<Object> type (eg if a function returns a Number and a Boolean). In that case we would expect this to work:

   function f() { return [ 10, true ] }
   var a : Number;
   var b : Boolean;
   &[ a, b ] = f()

but will it? Will explicit casts be required? Should casts be inserted by the implementation?

(Ensuing interesting discussion has been moved to type system. — Lars T Hansen 2006/04/18 04:34)

Should it be legal (or is it legal already) to recode the above as

   function f() { return [ 10, true ] }
   var [ a : Number, b : Boolean ] = f();

This seems much clearer as to intent and would be my preferred syntax (if legal) — Steven Johnson 2006/10/03 18:05

According to the proposal both function heads and catch clauses allow the standard annotation syntax here, which on variables would be

    var [a,b] : [Number,Boolean,*] = f()

I would probably suggest that we allow that.

I agree this is clumsy but an important aspect of the destructuring assignment which has since been propagated to many other proposals is that destructuring patterns and type annotations are syntactically like the structures they destructure or annotate, ie, we have avoided introducing any new syntax for these. Whether wise or not :-)

The clumsiness disappears to some extent where named types are used, consider

    var { "name": name, "age": age } : Person = getPerson(37)

Lars T Hansen 2006/10/11 05:43

Performance

Are array returns and destructuring assignment fast enough to implement multiple return values? Even though small arrays can be allocated efficiently, a single allocation may be too much in some cases where a function is called very frequently: for iterators returning a [key, value] pair and for low-level primitive methods.

Implementations can probably avoid the allocation by using a standard trick: A function that expects to receive multiple values from a call places a mark on the continuation of the call; a function that returns multiple values checks whether its continuation is marked, and if it is, values can be returned using a non-allocating protocol.

In practice, a function that does not return multiple values need do no checking, because a multiple-value-receiving continuation uses a special return address as the mark, and code at that address can convert single values to multiple values if the language so dictates.

In ECMAScript this could be implemented in the following way: A function that uses a destructuring assignment that meets some criteria (eg a flat array pattern up to a certain size) marks the continuation. A function that returns a structured literal that matches the same criteria checks the continuation for the mark, and if set, does not allocate the object. Otherwise the object is allocated and returned in the normal manner. Code that does not return a structured literal need do nothing special, nor does code that does not use a destructuring assignment.

Notes for the CLR

Currently, there is no performant way to do continuations on the CLR. The CLR does not does not grant programs full access to their stacks, nor does it provide instructions for installing and saving the runtime stack. Programs can work around this by managing their own stack on the heap (i.e. manage a stack-away-from-the-VM stack). However, the JIT is less efficient on code that manage their own stack. Furthermore, allocating the stack on the heap effectively disguises the stack, hiding it from programming tools such as debuggers and profilers, as well as security managers, which expect to find run-time information on the stack.

The CLR does provide means for installing exception handlers, and Continuation marks can be simulated using exception handlers and exception throws on the CLR. At the expense of performance.

Pratap Lakshman 2006/11/02 07:41

Right. What I wrote above was just a response to Brendan’s concerns about the efficiency of generalized destructuring assignments as compared to a specialized syntax and semantics (multiple-value-return? I forget what it was called). I do not share Brendan’s concern, and I suspect that on the CLR you would not do any magic to implement this: a function whose return expression is an array initializer expression would actually construct and return that array, and the receiver would destructure it in a fairly naive way. I expect most implementations would be happy doing it that way.

Lars T Hansen 2006/11/04 09:25

Other issues

I don’t think there is any particular reason why the receivers of the destructuring need to be variables, I suspect they could be general lvalues without causing real trouble for either parsing or code generation.


Agreed on allowing general lvalues.

Since this proposal is sweet, sweet sugar, why not go further:

var &[x, y, z] = get_3d_point()

This wins especially when there are type annotations on the variables.

Lars pointed out how the proposal’s winning re-use of existing (albeit &-prefixed) initialiser syntax on the left-hand side of assignment ceases to be re-use if transplanted to var statements, but still: the above is both convenient and consistent. Also, the &-prefixing alters initialisers enough that one could argue “in for a penny, in for a pound.”

Jeff suggested that:

var (x, y, z) = get_3d_point()

is better-because-simpler syntax, but that looks like group assignment. We could certainly refrain from combining destructuring assignment with var declaration, but I wanted to push this proposal toward the utmost convenience, since other languages go all the way.

Brendan Eich 2006/03/17 21:18

var makes a fair amount of sense and I think I’m in favor. It was really combining it with let I was objecting to, since then this would go from being an assignment form to being a scoped binding form. Though come to think of it, if we do var we could also do let:

   let (&[x,y,z] = foo()) { ... }

since IIRC this has an obvious expansion to

   let { var &[x,y,z] = foo(); ... }

Why have special cases?

Lars T Hansen 2006/03/18 06:59

Agreed on avoiding special cases where possible. The let head can’t be translated to var at the front of the subsequent block, however, in the case where a head-bound identifier shadows an outer binding of the same name. But yes, let’s generalize the destructuring assignment syntax that can be used after var, in the let head, and of course on the left-hand side of simple assignment.

Brendan Eich 2006/03/18 08:03

Yes, the let thing will need to be a little more careful.

Speaking of removing restrictions: destructuring makes sense at least in for-each statements, where it performs destructuring of values from an array:

   for each ( var &{ name: n, salary: s } in mylist ) 
      ...

(It also makes sense more or less with normal for statements, though it’s probably not super useful.)

If we adopt JS-1.5 style string indexing (which we should, though no proposal exists for this anywhere) then destructuring of strings will just work by itself:

   var &[a,b,c] = "foo"

The consequence of this is that if we allow destructuring in the setup clause of for-in statements then we get to have loops like this:

   for ( var &[a,b,c,d,e,f,g,h,i,j] in obj ) {
     // do something with the first 10 letters of each property name (any of which may be shorter)
   }

An interesting competing interpretation would be to return both the index and the value:

   for ( var &[idx,val] in obj )
     ...

which I guess more or less removes the need for for-each in the language, since idx can be omitted above. I’m not necessarily advocating removing it! :-)

This discussion on removing restriction suddenly melded with the discussion about the benefits of proper scoping we had on Friday, and I’m wondering if we should not allow let to be used just like var in all for statements:

   for ( let i=0 ; i < 10 ; i++ )
     ..

The benefit of this would be to actually provide better scoping mechanisms that don’t break existing mechanisms, so that better programs can be written in the new language. I guess I could sneak this in as a proposal by adding it to the let proposal, but what are your thoughts?

Lars T Hansen 2006/03/19 07:22

I too thought of let as an alternative to var but with block scope. It is simple, direct, and grammatically unambiguous here, thanks to the ( or { that follows in the block expressions proposal. It preserves backward compatibility. I say propose it in block expressions.

Destructuring key/value pairs is nice (Pythonic, even – see PEP 234), but not narrowly compatible with destructuring assignment combined with string character indexing as you note. Other languages allow matching several elements per iteration using something similar, which is yet another competing interpretation. Since this is new syntax, we have an opportunity to choose the most useful semantics. But which would that be? I’m inclined toward the Pythonic dict iteration form at the moment.

I added operator syntax for indexing (and slicing) to bug fixes.

Brendan Eich 2006/03/20 01:05

The XML connection

Ed suggests this can be extended to XML destructuring for E4X:

   var id_value, tag_value;
   &<myelement id={id_value}><{tag_name}>{}</{}></myelement> = getXML()

Implicit in this is that you can use {} to ignore XML content according to the grammar, and that end tags match start tags. Even if we don’t do this now we might take care to reserve the &< syntax.

This should be specified in terms of E4X selection operators, ie, there would be a straightforward expansion from the surface syntax to a series of assignment statements involving E4X operators on the right-hand-sides.

The RegExp connection

Given Pythonic named groups as proposed in extend regexps, we could support destructuring regular expression match as well:

  var name, value;
  &/^\s*(?P<name>\w+)\s*=\s*(?P<value>.*)$/ = read_line();

Or combined so variable names don’t need to be restated for no good reason:

  var &/^\s*(?P<name>\w+)\s*=\s*(?P<value>.*)$/ = read_line();
  set_pref(name, value);

Of course, this could be done with capturing parentheses and array destructuring:

  var &[, name, value] = /^\s*(\w+)\s*=\s*(.*)$/.exec(read_line());

or with callable regular expressions (call invokes exec):

  var &[, name, value] = /^\s*(\w+)\s*=\s*(.*)$/(read_line());

and this seems about as readable. Python’s named groups extend reserved (?...) syntax with P< or P= at the front of the ..., which adds to the cybercrud or “line noise” problem. Either form could be optimized to avoid capturing the entire match or creating the match array. Comments?

Brendan Eich 2006/03/20 12:29

When I first saw the example with P< I laughed so hard I almost fell off my chair. It’s an obvious example of reusing constructor syntax for destructuring.

I think that if we do start supporting P< and P= then it would make sense to support regex literals in lvalue contexts.

On the other hand there’s really no good reason that the result array should not have named properties for named subexpressions, so

  var &{ name: n, value: v } = /^\s*(?P<name>\w+)\s*=\s*(?P<value>.*)$/(read_line())

also makes sense, and is very clear. Furthermore, your previous example can be rewritten as

  var &{ 1: name, 2: value } = /^\s*(\w+)\s*=\s*(.*)$/(read_line());

One would probably define the meaning of regex literal in an lvalue context in terms of one of these two.

Lars T Hansen 2006/03/22 03:24

successor-ML and row capture

The wiki for successor-ML has a page (http://successor-ml.org/index.php?title=Functional_record_extension_and_row_capture) describing very similar syntax to this destructuring proposal, with the added functionality of capturing the fields of the right hand side that are not extracted by the destructurer:

  { name: x, address: y, ...: z } = { name: "Lars", address: "Oslo", job: "Programmer", employer: "Opera" }

would let z be the structure { job: “Programmer”, employer: “Opera” }. I don’t know if it’s worth our while to do this – ML records are immutable, I think, so not like ECMAScript. In addition, for array patterns it becomes obscure, and prototype properties may be troublesome.

Lars T Hansen 2006/05/05 20:06

One could argue by analogy to the in operator and for-in loops that DontEnum-attributed prototype properties should not be matched by ....

However we might define this, I can see it being quite useful for cases where you want to future-proof some kind of transcoder to avoid data loss.

Brendan Eich 2006/05/08 23:09

The DontEnum escape clause makes sense to me.

Successor-ML also reuses this syntax for construction to implement nondestructive record update. Suppose you have some record r. Then

  { ... = r, a = 10, b = 20 }

is a new record containing all the fields of r and adding new fields a and b with values 10 and 20.

I don’t know how much sense this makes for ECMAScript; for us, the ... operator would generally have to copy the fields from r since we have mutable records, so the nondestructive update is sort of lost, but it’s neat syntax for when you want it.

Lars T Hansen 2006/05/09 12:38

Object property shorthand

We decided against object property shorthand in destructuring such as:

{ x, y, z } = { x: 1, y: 2, z: 3 }

This would be nice and concise, but possibly so terse as to be confusing. Part of the justification of destructuring is that its syntax exactly mimics that of structuring.

Dave Herman 2006/10/20 12:24

 
discussion/destructuring_assignment.txt · Last modified: 2009/07/30 22:24 by brendan
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki