DO NOT EXPORT

Type conversions

There are a number of places where type conversions occur, e.g.:

  • interconvertible types like int and double
  • interconvertible types with to methods
  • structural types with convertible component types
  • using something of type * in a statically typed context

In each conversion, there is a static portion and a dynamic portion: the static type checker knows what types are interconvertible and allows or disallows the conversion accordingly. The dynamic portion is the actual runtime conversion.

For simple types like int and double, the runtime conversion is straight-forward. But there are two things that make conversion more subtle: functions and mutable slots.

Take the following example. We expect to be able to use a dynamically typed object with a (static) structural object type. But the mutability of the slots is problematic:

type Box = { val: TCPSocket }
 
var boxAny : * = { val: new TCPSocket } : *
var box : Box = boxAny          // conversion from * ~> Box
boxAny.val = new RubberChicken  // no problem, boxAny is untyped
box.val                         // ERROR: no longer a TCPSocket!

At some point, it has to be an error for o.x.val to be a RubberChicken, because the type system promised it would always be a TCPSocket. It’s generally not possible to catch this kind of error statically, because of type *. But there are many options for how and when to generate this error at runtime.

Solution 1: Updateable type constraints (DEPRECATED)

When a value is used in a context that needs to impose more type constraints on it, we could simply update its internal type constraints. This is simple and efficient, but means that when passing a value to multiple places, which could be in totally unrelated parts of the program, the type constraints of one context could affect the behavior of the other. This violates the program’s abstractions and will generate very confusing errors that are hard to debug.

DEPRECATED: violates abstractions

Solution 2: Automatic dynamic wrappers

Since an object’s type constraints might be violated by a mutation at any subsequent point in the program, like the one in the Box example, it’s not enough to check the constraints only at the time of conversion. One solution is to protect a typed property whose value is untyped with extra checks at all gets/sets. This can be thought of as wrapping the value in a “view” or “fat pointer” that allows mutations to still happen on the underlying object and claims to be === to the original object, but wraps its get/set operations with extra checks.

This approach can also be used for wrapping functions: to convert a function of type (function(*):*) to a function of type (function(int):String), you wrap it in a proxy whose [[Call]] behavior inserts extra before/after dynamic checks to preserve the type’s invariants. Using the “view” approach, this wrapper can actually still allow the original function object and wrapped function object to share property tables and be considered the “same object” according to ===.

The benefits of this solution are that it’s safe, fully general, and allows for the full range of conversions. The drawback is efficiency: it means every communication between typed and untyped code incurs a space and time cost for adding a wrapper. If this were to make crossing the boundary between typed and untyped code too expensive, it might discourage users from mixing the two styles.

On the other hand, if most heavy computation doesn’t happen across these boundaries, this efficiency cost might be negligible. The only way to know is to experiment, but we should think about this. If most tight inner loops are not communicating between typed and untyped code, it may not be such an issue.

This solution also implies that all operations on objects might have to operate on views, which has global implications for efficiency. It can probably be as small as a single branch and pointer indirection, but it’s widespread. (Clever compilers can prove with flow analysis when no mixing could occur and optimize away some of these checks, but in general this is probably unlikely.)

Solution 3: Explicit dynamic wrappers

On the other hand, if the cost of wrapping is too much to incur silently, we can force the user to request the wrapper by making automatic conversions between typed/untyped code illegal, but provide an explicit cast operator that adds wrappers as above. This way users don’t have to implement the wrappers themselves, but they do have to ask for it when they want it.

Solution 4: Write barrier type-check

A simple possibility is to perform typechecks when assigning an object to a slot. When an “untyped object” (an object with type *) is passed to a typed slot (a slot with type T != *), the object type is not compatible with the slot type so an exception is raised.

In this case, there are no safety issues, but some potential pain exists when upgrading libraries to “types mode”: legacy untyped user code cannot pass structured parameters, for example.

GROUP REACTION:

  • Everyone’s concerned with the example case of “adding types to the DOM
  • Read barriers probably better (case 8, with dynamic optimizations)

Solution 5: Explicit dynamic (deep) copy (DEPRECATED)

Instead of “views” which share state with the objects they wrap, we could make a cast that copies the object. This is likely to be much more expensive than the view approach, since it probably requires a deep copy in order to handle all conversions recursively throughout a compound structural type. The semantics of copying complicated ES objects might not be obvious, either. But it avoids the difficulty of implementing views, and it avoids the global invariant that all object operations might involve an indirection through views.

DEPRECATED: too deep (often copies entire world)

Solution 6. Explicit dynamic (shallow) copy (DEPRECATED)

Instead of a deep copy, we could provide only a certain set of shallow copying cast operators (e.g., one for function types, one for structural object types, etc.). Users who want to convert a compound type with multiple conversions have to implement the recursive deep copy themselves.

This is simpler than the deep copy option, and might not be terribly inefficient in practice. As with deep copy, it avoids the efficiency issues of making all operations in the runtime operate on views.

DEPRECATED: almost as inconvenient as no copy, copy semantics precludes mutation

Solution 7: Automatic dynamic copy (DEPRECATED)

Instead of explicit copy operators, it’s also possible to implement an automatic conversion via copying. This is essentially what Lars’s functional wrapper proposal is doing for functions, and the analogous behavior for object types would be to perform a deep copy. The more I think about it, the more fraught with issues deep copy is.. But anyway, this is a possibility.

DEPRECATED: terrifying

Solution 8: Read barrier type-check

This approach avoids the time and space overheads (and implementation complexity) of wrappers (see option 2 above), but requires dynamic checks to be sprinkled (to a greater or lesser extent) throughout the program.

For the example above, the conversion from * to Box does not introduce a wrapper or view; box is just set to point to the same record as boxAny. On the access to box.val (or any access to a record), since box has type Box = {val:TCPSocket} and the underlying record is not guaranteed to hold a TCPSocket, the compiler will insert a check that box.val returns a TCPSocket, and will issue a dynamic type error otherwise. Note that this error is detected at the same time as option 2, though with different machinery.

For other code like:

type Box = { val: TCPSocket }
function f(box:Box) {
    return box.val;
}

the compiler will also insert the same dynamic check.

Essentially, option 2 introduces run-time data structures (wrappers and views) to ensure that the necessary run-time checks are performed; this option 8 statically inserts those checks everywhere they might be needed.

Of course, a sufficiently-smart compiler could prove many of these checks are redundant, and the necessary analyses are fairly tractable. One limitation is that worst-case assumptions would need to be made about data coming from outside the current module, or unit of analysis.

INITIAL GROUP REACTION:

  • Jeff thinks this may be viable
  • Cormac: dynamically less inefficient (less runtime type information)
  • Brendan: compiler optimization burden too heavy?
  • Lars: probably too expensive; won’t have a smart compiler.

Informative notes on dynamic optimization

The group discussed this point at some length during the July meeting, and settled on two possible dynamic optimizations that can make read barrier type-checks cheaper. Cheap enough, we believe, to make them feasible even in dynamic implementations with “non-smart” compilers. While not necessarily a part of the language semantics, we felt it important to record these possible techniques.

In the following, we assume that there is an object OBJ that has a type T, and n-1 slots S0:U0 .. Sn:Un outside OBJ, that refer to OBJ. Then when attempting a property read Si.x, intended to extract field x from OBJ via some referencing slot Si:Ui, we have two ways to cache type-compatibility information to make this read efficient:

  1. OBJ can carry a single bit that is set when any slot Sk:Uk exists with Uk != T.
    • When the bit is set, Si.x must perform a typecheck between Ui and T.
    • When the bit is cleared, we know that every Sk:Uk has Uk == T, so Si.x can proceed without a typecheck.
    • This bit (on OBJ) may be modified any time a new Sj is made to refer to OBJ.
  1. Every slot Si:Ui that currently refers to OBJ:T can carry a single bit that indicates whether Ui != T.
    • When the bit is set, as above, Si.x must perform a typecheck between Ui and T.
    • When the bit is cleared, as above, no type comparison is required.
    • These bits (on every Si) must be set or cleared any time an Si is given a referent: there’s only one bit and it relates to the current referent.

Essentially the first case spends a bit less space (one bit per object), but is more likely to enter the slow slot-to-object type comparison. The second case spends a bit more space (one bit per slot) but is less likely to enter the slow type comparison.

Solution 9: Write barrier with deep checking

(Added 2007-07-20 at the Yahoo! (Sunnyvale) meeting, after long discussion about use cases, see minutes.)

Variables, parameters, and functions can be annotated with structural object types. When a value is stored into a variable or parameter so annotated, or when a value is returned from a parameter so annotated, then a deep check is made to ensure that the object conforms to the type in question: all fields must be present, and must contain values that are compatible with the specified field types.

In other words, this is a write barrier with a weak notion of safety. There is no checking on reads or writes to components of the value, only on writes to the annotated variable or parameter (and of course any annotated fields or properties in it, if the actual value is a typed object structure or a nominal type instance).

It is a useful solution mainly because it (a) speaks to all the use cases except strong safety and (b) is tractable (not a research problem).

INITIAL GROUP REACTION:

  • Lars proposed it
  • Brendan thinks it’s a viable compromise
  • Chris mentioned that this is what people have asked for in the Ruby world
  • Cormac objects to the very weak notion of safety and the conflict between the syntax and the semantics; by analogy with other annotations the former implies something much stronger than what is provided by the latter
  • Jeff doesn’t know how well it will do with function types and array types, there may be semantic/performance problems

Cormac notes that a safety bit a la Solution 8 can be used here: if an object matches a structural type with all statically typed fields then the bit can be set on the reference, and code can use this for speeding up operations, eg, x.y.z to go down a path: less checking.

AFTER OFF-LIST DISCUSSION:

Trying to sum up, Lars writes:

  • future-proofing demands that we not co-opt type annotation syntax for something not providing type-like guarantees (jeff, lth)
  • type annotation syntax should provide type-like guarantees now (jeff)
  • explicit wrapping is viable (everyone), with an operator “x wrap T” or “x like T” or “x to T”
  • unwrapping may be required (brendan)
  • important use cases require that annotated APIs can accept untyped objects from code that can’t change (lth, brendan)
  • pushing annotations out of the function header and into the body weakens the language in a sense (cormac, lth)

Cormac proposes that like can be used as an annotation, Jeff proposes that it can be used as an operator, in both cases it would check and wrap. (We need to resolve what the wrapper would do if the object changes under its feet, but that’s for later.)

Leaving functions aside for the moment: I would like to suggest a variant on this. The purpose of this variant is to retain the ability to have assertions without wrapping, and to have wrapping when desired, and to use strong future-proof structural annotations.

(1) Structural type annotations can be used to say something about the permanent structure of the object:

function f( x: T ) { ... }

means that the value passed to x must have the structure indicated by T and this structure must be permanent (type-safe). Untyped code can’t pass old objects through this kind of an interface.

The operator “is” performs this kind of test too.

(2) The annotation and operator “like” is a shape test that says nothing about permanent structure. It can be used in a header to provide an assertion on entry or return, and in the body to generate a boolean”, and in “switch type”:

function f( x: like T ) : like S {
    if (x like T)
       ...
    switch type (x) {
    case (x: like S) { ... }
    }
}

The type can be a structural object type or structural array type. Untyped code can pass old objects through this interface, and conformance will be checked.

(3) The annotation and operator “wrap” is exactly “like” plus construction of a type-safe wrapper if necessary. The wrapper is known to have the correct shape, and this shape is permanent (type-safe):

function f( x: wrap T ) : wrap S {
    v = x wrap T;
    switch type (x) {
    case (x: wrap S) { ... }
    }
}

Now only wrapped objects pay for the read barrier, and the “wrap-if-you-must” semantics provide for the optimization of Solution 8 in that there will be no wrapper if it’s not necessary; obviously the wrapper is more expensive than a bit but perhaps a good compiler can handle this if the object does not flow to other functions. (If the object does flow to other functions those other functions will get a wrapper that does not lose any safety in the process even if they receive untyped arguments.)

Wrapper semantics:

IMO, the wrapper should be constructed s.t. it performs a “like” test every time it pulls a field out of its underlying structure, and throws an error if the shape is not right, IOW, a deep check. I think this is what’s wanted. But again, this is a little secondary.

Lars T Hansen 2007/07/31 10:59

Of further note is that “wrap” makes sense for function types, so an interface like this:

function f( fn: function(int,string):boolean )

accepts functions statically typed like its argument, but

function f( fn: wrap function(int,string):boolean )

would accept some notion of compatible functions too, notably ...* → *, and wrap them in a closure that performs argument checking and return value checking.

There is also the notion of using “function” as a type meaning “some function”, so that a union type can capture “other stuff”,

function f( fn: (function(int,string):boolean,function) )

Lars T Hansen 2007/07/31 11:39

Finally, wrapping might be defined along the following lines (just a sketch from some earlier mail!):

Attempt to define what it means to wrap a structural type:

The phrase

x wrap { f1: t1, t1: t2, ... }

evaluates to a structure y s.t. “y.f1” returns “x.f1 wrap t1” and “y.f1 = v” stores v in x.f1 if v conforms to t1 and otherwise fails with a run-time error. Repeated accesses to an unchanged field of a wrapped structure returns the same value, so if x.f1 does not change between two accesses to y.f1 then the second access returns the same value as the first. Wrapping a structure that has an annotation making it conform to the target type never creates a new object.

The intent is that if x has manifest type t then x wrap t returns x, and if not it returns some object that provides a t-like view on x.

I don’t mind that if it’s explicit, probably.

If x is a function and ft is a function type t1*t2*...*tn → tr then

x wrap ft

is

let (v = x)
  function (p1:t1, ..., pn:tn) : tr { return v(p1, ..., pn) }

If x has the correct manifest type, x wrap ft ⇒ x.

The phrase

x wrap [t]

is an object that acts as a proxy for the array, returning v wrap t on read and accepting values of type t on write.

Lars T Hansen 2007/07/31 11:39

Discussion

Of all of these possibilities, I prefer solutions 2 (automatic wrappers) or 3 (explicit wrappers), because they allow all the conversions we want with a hopefully small overhead. If solution 2 is too “magic,” solution 3 allows somewhat greater user control. Solution 6 (shallow copy) might be okay, although I think copying is a little subtle in ES, with all the complications like prototypes and internal properties.


Graydon reminded me that at the April meeting at Microsoft, and in the followup proposal that Lars wrote at type system, we did indeed agree to use Solution 4 (”Disallow problematic conversions”). To ease the pain of migrating ES3 to ES4 code with structural types, I recall that we agreed to something else that Graydon reminded me of, which is fairly buried in my comments in type system:

  let obj : SomeStructuralType = new SomeStructuralType(untypedObj)

That is, we chose Solution 4 combined with an explicit way to construct a (possibly lazy) deep copy of an ES3 object that has the desired ES4 structural type.

Brendan Eich 2006/06/30 14:30

I think Solution 4 is the only that makes sense to me.

I’m skeptical that the pain of mixing typed and untyped code will be as great as suggested, and I’m skeptical that this will happen a lot. There are a couple of reasons.

The main application of mixed typed/untyped code is probably small untyped scripts interacting with typed libraries. These scripts will usually manipulate either instances of classes exported by the libraries or they will have access to named structural types with which to type their object literals, when necessary. Neither case seems to require conversion.

Migrating from untyped to typed code is a little harder, but from my point of view on the internet, there exists no typed ECMAScript code at this time, so fitting into existing (untyped) libraries is trivial. Typed client code can often be hidden behind an API. When a library becomes available in a typed form, client code is converted to typed form and upgraded.

Lars T Hansen 2006/07/19 09:20

It seems our language design choices are:

  • non-strict: types are enforced dynamically, and structured data can pass between typed and untyped code.
  • strict 1: types are enforced statically (and partially dynamically), and structured data can pass between typed and untyped code.
  • strict 2: types are enforced statically (and partially dynamically), and structured data cannot pass between typed and untyped code.

I fear that under option “strict 2”, in practice we will only have two kinds of programs - those that are fully typed, and those that are fully untyped, and no easy migration path between the two.

Cormac Flanagan 2006/07/21 16:22

Runtime types

Every ES runtime value contains an internal tag indicating some information about the value’s type. Dynamic checks involve checking that tag to make sure that the value’s tag is compatible with the expected type.

For primitive types and classes, the tag is straight-forward: it’s just the primitive type or class. For untyped functions and arrays, it’s Function or Array, respectively. But for structurally typed functions and objects, the type tag should contain all the static type information that was associated with the original value. Dynamically typed object literals have the structural type {}, because it’s known that they are objects but there are no other known constraints.

Examples:

// tag: function(int):String
function f(x : int) : String { ... }
 
// tag: { x: int, y: int }
{ x: 3, y: 4 } : { x: int, y: int }
 
// tag: { x: int }
{ x: 3, y: 4 } : { x: int }
 
// tag: {}
{ x: 3, y: 4 } : *

Dynamic checks

There are several different kinds of runtime checks possible for a value.

Tag subtype checking

Checking the value’s tag involves a runtime invocation of the static subtyping algorithm. Notice that this algorithm is not required to look at the value’s actual contents, only its tag. For example:

var o : * = { x: 3, y: 'foo' } : { x: int }
o isTagCompatible { x: int, y: int } // true

Notice also that for union types, the tag subtyping check involves an implicit typecase on the variants of the union:

var o : * = function(x) { return x }
o isTagCompatible (int, function(int):int) // true

Deep tag checking

Checking the actual contents of an object recursively is another possible operation, perhaps implementable in terms of the tag check:

var o : * = { x: 3, y: 'foo' }
o isDeepCompatible { x: int, y: int } // false
// same as:
(o isTagCompatible { x: int, y: int }
    && (!o.x || o.x isTagCompatible int)
    && (!o.y || o,y isTagCompatible int))

Notice that tag compatibility is a permanent guarantee, whereas deep tag compatibility can change. For example, the isDeepCompatible check on o will succeed after setting o.y = 4.

Implications

Which checks we use and where we use them will affect the definition of the runtime portion of the type system. We still need to specify where and when these checks are required.

Convertible types

In strict mode, we need to determine just what types are considered convertible. (In the formal type system, we’ve represented this as the “⇝” relation, i.e., “T1 ⇝ T2” means that type T1 is convertible to type T2.) There are some interesting quirks:

Object field invariants

This is just quick notes because I’m running out of time today (sorry!).

  • covariant properties can be subverted via mutation
  • same problem as Java’s covariant arrays
  • two possibilities:
    • dynamic guards (with all the attendant design options)
    • make structural object properties invariant

In the July meeting, we agreed on covariant subtyping for object fields, which will require a run-time subtype check on writes to object fields.

Type conversions and tail calls

Type conversions can interfere with proper tail calls. For example, using only very simple nominal types:

class Even {
    static var odd : Odd = new Odd;
 
    function test(n:int) : boolean {
        if (n == 0)
            return true;
        return odd.test(n-1);
    }
}
 
class Odd {
    static var even : Even = new Even;
 
    function test(n:int) : * {
        if (n == 0)
            return false;
        return even.test(n-1);
    }
}

Both of the mutually recursive calls to test methods should be tail calls, but when Even.test calls Odd.test it has to check its result to make sure it’s a boolean, so it has to store a type-check on the stack.

More generally, take any function function f(..) : T { .. return g(..) .. } where the expected type of g(..) is S.

If S and T are the same type, or T is a subtype of S, for example, only one check is needed on the stack, and the call to g can be a tail call. But in strict mode, S may only be convertible to T, and in standard mode, S and T might be completely unrelated. In this situation, a check for type T needs, in theory, to be performed after g(..) returns. These checks may destroy tail recursion.

Type check elimination

Here is a relatively simple approach to restore tail recursion, in the context of the above function f which tail-calls g. Just before performing the call to g(..), look at the stack to see if the current activation has a return-type-check

  • if it doesn’t, pop the current activation, store a T-type-check on the stack, and perform the call.
  • if the current activation has a U-type-check then pop the activation, pop that U-type-check, merge the types T and U into a combined type S (see below), push the S type check on the stack, and perform the call.

If T is *, none of the above is necessary, and the tail call can simply pop the activation, leave any existing type check on the stack as is, and perform the call.

Merging type checks

If U is * then we should push the more precise check T (and vice versa if T is *).

If U is a subtype of T, then we should push the more precise check U (and vice versa if T is a subtype of U).

However, two types may be compatible but neither is a subtype of the other. For example, function(int):* and function(*):int.

One solution would be to combine two compatible types into a single merged type-check. E.g. function(int):* and function(*):int would become function(int):int. This way there’s only one type-check saved on the stack and it can’t grow too big. It does require synthesizing new type objects, using some appropriate algorithm.

A simpler solution is to drop the second type check on the stack, and just keep the oldest one. This approach is type sound. But it violates abstractions by allowing f to cancel the return-type guarantee of g.

Implementation via cast-passing style

Merging casts on the stack as described above requires access to the cast on the top of the stack. If this cast is in the ML stack (in the reference implementation), such access is impossible. An alternative approach is cast-passing style, where each call to the evaluator passes in as an extra argument the cast on top of the stack (or null, if the top of the stack is not a cast).

eval(expr,topCast) {
   case expr of {
       apply(e1,e2) :
            let fun(formal,body,returnType) = eval(e1, null)
                arg = eval(e2, null)
            in eval( subst(arg,formal,body), merge(topCast,returnType) )
       ...
   }
}

Implementation via stack annotation

These checks work a lot like stack annotations in the stack_inspection proposal, and could even be implemented with the same mechanism. The annotate method stores a type check above the current activation and getCurrentAnnotation tells us whether there is a current type-check stored above the current activation.

If we had a special stack annotator called returnTypeChecks:

magic native var returnTypeChecks : ControlInspector<TypeObject>;

then we could store type checks on the stack as stack annotations, and implement return-type-checks by instrumenting function calls. For example, a tail call to f(x) would be rewritten as:

let (T = returnTypeChecks.getCurrentAnnotation(), U = typeOf(f).returnType)
  (U == (type *)
     // we never need a *-type-check
     ? f(x)
     : (T === null
          // create a U-type-check
          ? (returnTypeChecks.annotate(U), f(x))
          : (U.isSubtypeOf(T)
               // replace the T-type-check with U
               ? (returnTypeChecks.annotate(U), f(x))
               : (T.isSubtypeOf(U)
                    // keep the T-type-check
                    ? f(x)
                    // keep the T-type-check and create a nested U-type-check
                    : (function() {
                           returnTypeChecks.annotate(U);
                           return f(x)
                       })()))))

The evaluator would then need extra built-in behavior to check for the existence of a return-type-check annotation and perform the check whenever a function returns.

Merging sets of types

A different approach to merging types is akin to hash-consing: instead of keeping a single type-check on a stack frame, we keep a set of type-checks. Then instead of merging two casts, we merge a single cast into a set, but don’t re-add it if it’s already there. Then tail calls always replace their stack frame, but they may accumulate more than one cast.

Implementations

An implementation via cast-passing style would be the same, except the merge function would operate on a type check and a set of type checks.

For an implementation based on stack annotation, a simple implementation of this merging function might look like:

function merge(Ts, U) {
    let newTs = [];
    let found = false;
    for (let i = 0; i < Ts.length; i++) {
        let T = Ts[i];
        if (T == U)
            found = true;
        else {
            if (U.isSubtypeOf(T)) {
                found = true;
                newTs.push(U);
            }
            else if (T.isSubtypeOf(U))
                newTs.push(T);
            else
                newTs.push(T);
        }
    }
    if (!found)
        newTs.push(U);
    return newTs;
}

and then a tail call to f(x) would look like:

let (Ts = returnTypeChecks.getCurrentAnnotation(), U = typeOf(f).returnType)
  (U == (type *)
     // we never need a *-type-check
     ? f(x)
     : (Ts === null
          // create a U-type-check
          ? (returnTypeChecks.annotate([U]), f(x))
          // merge U into the existing set of checks
          : (returnTypeChecks.annotate(merge(Ts, U)), f(x))))

Other considerations

The type-check elimination rules should work for relationships between things like a nominal type and the dynamic object type {*} – I think subtyping makes that work for free. There may be a few kinds of types that can be eliminated even when subtyping doesn’t work out.

Dave Herman 2007/04/25 14:45

 
clarification/runtime_types_and_conversions.txt · Last modified: 2007/07/31 18:45 by lars
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki