Generic functions for ES4

2007-08-13 / lhansen@adobe.com / rev 3

Also see the discussion page for this proposal.

Motivation

Our current operator overloading proposal is very weak. We ought to have something with proper type dispatch. Instead of putting something ad-hoc into classes we should just provide for generic functions (multimethods).

Overview

Generic functions are function objects each with a set of attached methods. A call to a generic function matches the types of the actual arguments to the signatures of the attached methods and dispatches to the most appropriate method following deterministic rules.

In this proposal, generic functions are only definable at the top level of programs, packages, and functions. See the last section for other opportunities.

New methods can be attached to a generic function at any time. Methods cannot be redefined or removed.

At the time a generic function is called, the applicable methods (determined by the manifest types of the actual arguments) are selected and ordered by specificity. Methods that cannot be ordered cannot be invoked, and if the set of ordered methods is empty the function cannot be invoked at all and an error is signalled. The most specific member of the set of ordered methods is invoked; the method can ask that control be passed to the next most specific method, and so on.

Proposal for generic functions

Defining functions

A generic function is defined by a body-less function definition using the contextual keyword pair “generic function”:

  generic function f(x, y);

Optional and rest arguments are allowed. The default values for the optional arguments will become the default values for those arguments for all methods.

The non-rest arguments are called the fixed arguments. The arity of a generic function is number of fixed arguments.

Type annotations called constraints are allowed on the fixed arguments and on the function itself, to denote a return type. Constraints serve two purposes: they constraint the specializers and return types of the methods (see the next section), and they provide a type for the generic function itself:

  generic function f(x: Numeric, y: Object): MyClass;

Generic functions cannot be parameterized by types, eg, generic function f.<T>(...) is not allowed. (This can change, but some issues must be resolved first; see the discussion page.)

Defining methods

A method is defined on an already defined generic function by providing specializers for the fixed arguments and a body for the code. There must be exactly as many specializers as there are fixed arguments.

The specializers are types, and are expressed as type annotations on the method parameters. The body is provided as a regular function body:

  generic function f(x:int, y:boolean): string {
      if (y)
          return string(x);
      else
          return "37";
  }

The specializers must all be either nominal types (class or interface types) or union types of nominal types, plus null, undefined, and *. If a specializer is absent it defaults to *.

A method definition that uses a union type specializer actually defines a set of methods simultaneously, one method per type in the union.

Each specializer must be a subtype [sic] of the corresponding constraint in the generic function definition, if a constraint was provided.

The return type must be a subtype of the return type constraint in the generic function definition, if a constraint was provided.

If the generic function was defined as taking rest arguments then the method may be defined as taking rest arguments as well.

The method may not have any optional arguments; default values are provided for the trailing fixed arguments by the generic function.

Invocation

Generic functions are invoked using the normal function invocation syntax.

When the function is invoked, the manifest types of the actual arguments are compared to the specializer signatures of the methods, and a set of applicable methods is computed (see below).

The applicable methods are then ordered as described below, this yields a set of totally ordered methods and a set of unordered methods.

If the set of totally ordered methods is empty, an error is thrown. Otherwise, the first such method is invoked on the arguments. If the method returns normally or throws an exception then the generic function terminates in the same way.

Computing the applicable methods

A method m is applicable if, for every fixed parameter position i, the manifest type of fixed actual argument i is a subtype of the specializer for formal parameter i of m.

Sorting the applicable methods

The applicable methods can be sorted in order of specificity. This ordering prefers more specific methods over less specific methods (ie, if a specializer s is a subclass of another specializer t then the method using s is preferred) and breaks ties in a predictable way. This problem has been studied in the literature in the context of CLOS, Dylan, Cecil, and MultiJava.

For the time being, I’m proposing we use Dylan’s algorithm for Method Specificity computation (DRM pp 96-98).

For this algorithm we need to construct the class precedence list for classes and interfaces. We take the local precedence order of a class to be the list containing the base class followed by the implemented interfaces in the order they are present in the class definition, and the local precedence order of an interface to be the list of base interfaces in the order they are present in the interface definition. We then use Dylan’s algorithm for computing the class precedence list (DRM pp 54-56).

This yields a set of totally ordered methods and a set of unordered methods as desired.

(All of this is subject to revision as we study the problem.)

nextMethod

Inside the method body the function nextMethod is defined (it is passed as the first argument to the method). If nextMethod is called it invokes the next method in the list of ordered applicable methods on the argument values on which the present method was invoked. This call is a normal call and returns the result of the next method.

Null values, null types, and nullability

The type null is stripped from any type used as a specializer except when null is the lone specializer or it is mentioned explicitly in a union type that is syntactically visible as a union type in the specializer.

Thus if C is a nullable class, the following definitions do not specialize for a null value:

  generic function f(x: C) { ... }

  type t = (C!,null)
  generic function f(x: t) { ... }

But these definitions do specialize for a null value:

  generic function f(x: null) { ... }
  generic function f(x: (C,null)) { ... }

See the discussion page for some background and other options.

Types and type checking

Generic functions have a structural function type that is given by its declared constraints; for example, in the example above the type is function(Numeric,Boolean):MyClass. As a result, generic functions participate in compile-time and run-type type checking exactly like other functions.

If GF is a generic function, then:

  • typeof GF“function”
  • GF is Functiontrue
  • GF instanceof Functiontrue.

Any particular application of a generic function to a set of values may throw run-time type errors if (a) there are no applicable methods for the given arguments or (b) there is no unique most specific method for the given arguments.

Some work has been done on checking statically and modularly that these run-time errors will not occur [Clifton 2006]. These techniques depend on placing constraints on how generic functions are declared and imported and are probably not a good fit for ECMAScript’s generic functions in their full generality.

However, if we were to accept the (separate) proposal for final generic functions, then it should be possible for strict mode to check that generic functions so declared can’t throw these errors.

Error reporting

A method definition fails if the number of specializers does not equal the number of fixed arguments in the generic function. This error may be signalled at compile-time.

A generic function invocation fails at run-time if there are no methods in the totally ordered subset of the applicable methods.

toString

The toString method on generic functions returns an implementation-defined string consisting of text for a generic function definition followed by text for each of the methods defined on the function, in some arbitrary order, with all texts separated by a newline character.

Implementation notes

The realization of generic functions is left vague on purpose in this proposal. There is a proof-of-concept generic function system that is now part of the reference implementation that treats generic function and method definitions as syntactic sugar for constructors and method calls on a class called “GenericFunction”, which has an invoke() method to handle the call. This implementation uses some system hooks that will eventually be provided as standard meta-object hooks (according to the already accepted meta-object proposal).

In other words, apart from surface syntax, generic functions appear to be implementable in the language.

Due to some limitations in the current reference implementation, union types are not yet supported.

Open questions

  • Are union types still sorted? (I think not – that was for user-defined conversions, which are gone.) If so, does union order influence method ordering?
  • Consider Set.<int> and Set.<*>: should a method specialized on the latter apply to the former, like a method specialized on * applies to int? If so, what kinds of complexities are introduced?

Proposal: remove the current operator overloading mechanism

The operator overloading mechanism is too weak and hard to control. We should just remove it, whatever else we do.

Proposal for built-in generic functions

One of the strengths of generic functions is their ability to extend built-in facilities in a seamless way by means of adding cases to predefined hooks.

For example, in ECMAScript one might think of each of the built-in functions (eg Number) as a generic function because these functions perform conversion based on the input type; it might be useful for user code to extend the meaning of each such conversion by adding methods to the converter functions.

The definition of a full, coherent, and useful set of built-in generic functions for ECMAScript does not fit into the ES4 timeframe, but I would like not to take the language in a direction that prevents us from adding such functions in the future.

However, global generic functions shall be predefined for all the operator names in the language (full list to be provided), allowing for operator overloading on new types.

These built-in generic functions will be defined in such a way that an implementation can use fast paths that do not check if new methods have been defined on built-in generic functions; since fundamental types like int and string are final, and since generic functions do not allow for method replacement, there can be no more specific methods defined than those that are primitive to the system.

By way of example, here is a plausible definition for “+”

  // Since + takes one or two arguments we create a distinguished 
  // type for the one-argument case, letting a value of this type
  // be the default value for the second argument.

  private class Absent {}
  private const AbsentValue = new Absent;

  // Primitive number representations are handled here

  private type Numbers = (int,uint,double,decimal)

  // + takes two arguments but the second is optional
  generic function +(a, b=AbsentValue);

  // Unary +

  generic function +(a:*, b:Absent): Numbers
      ToNumeric(a);

  generic function +(a:Numbers, b:Absent):Numbers
      a;

  // Binary +

  // Anything and a string turns into string-append
  generic function +(a:string, b:*): string
      intrinsic::stringAppend(a,string(b));

  generic function +(a:*, b:string): string
      intrinsic::stringAppend(string(a),b);

  // Numbers are handled by primitive code
  generic function +(a:Numbers, b:Numbers): Numbers
      intrinsic::+(a,b);

  // Otherwise, convert both to primitive, then to number,
  // and add them.
  generic function +(a:*, b:*): Numbers
      intrinsic::+(ToNumeric(ToPrimitive(a)) + ToNumeric(ToPrimitive(b)));

Secondary proposals: final generic functions; static generic functions

The previous proposals do not depend on the following, but the following may not add much complexity and may have substantial utility.

Final generic functions

A top-level generic function may be designated final. The meaning of this is that methods may not be added to the function outside the package fragment in which the function is defined.

This is primarily a security measure: unrelated code won’t be able to extend a method to its own needs (or for nefarious purposes).

Secondarily, it aids strict-mode type checking and optimization.

Static generic functions

A generic function may be defined inside a class if it is designated static:

  class Cls {
    ...
    static generic function fun(...);
  }

These generic functions have the same semantics as top-level generic functions; only the scope is different.

Outside the class Cls containing generic function fun, methods may be added using the following syntax:

  static generic function Cls.fun(...) { ... }

Static generic functions can be designated final. The meaning is that methods may not be added outside the class in which they’re defined.

The special class static methods (the constructor; meta static invoke) can be generic functions.

Non-features, future directions, etc

  • Generic functions are not implicitly defined when a method is defined on a non-existing function. This keeps things simple and explicit.
  • There are no generic class-instance methods. These would not be hard add – the dispatch would be two-stage, with normal method lookup in the receiver followed by type dispatch – but there are various approaches we could take wrt overriding, so it seemed a bit much to handle it now.
  • Generic functions have only one “shape”, with a set number of fixed parameter. It would not be very hard to allow for multiple shapes, so that eg the hack with the “Absent” value in the case of generic + above would feel less unnatural.
  • The current proposal does not allow for removing methods or redefining them; but the intent is that it does not preclude these operations being added in the future. eg, non-removable methods on top-level generic functions could be designated “final”, every other method could be removable.
  • Structural object and array types cannot be used as specializers. These have prior history as “eql” specializers in CLOS; if we feel they are useful we can probably add them later.
  • The Function constructor can’t be used to create generic functions.
  • There is no way to add methods at run-time, in particular, “eval” can’t do it. The example implementation in the reference implementation does have run-time method-addition facilities, but those are not intended to be available to user code. (I don’t know how valuable this restriction is, but it does allow “eval” to use completely different evaluation strategies compared to an ahead-of-time compiler.)
  • There are no generic function expressions – they only make much sense when methods can be added at run-time.
  • It would be straightforward to allow nextMethod to take arguments so that the next method could be called on different arguments than the current method.
  • Calls to generic functions do not convert their arguments to primitive values if necessary (unlike normal calls). The point of generic functions is to discriminate among types, so conversions to primitive type does not seem very desirable. It may however desirable to perform conversion if there are no applicable methods for a call to a generic function. The discussion page outlines a plausible algorithm.

Background material

  • [Barrett 1996] Kim Barrett, Bob Cassels, Paul Haahr, David A Moon, Keith Playford, and P Tucker Withington: “A Monotonic Superclass Linearization for Dylan”, OOPSLA 1996.
  • [Clifton 2006] Curtis Clifton, Todd Millstein, Gary T Leavens, and Craig Chambers: “MultiJava: Design Rationale, Compiler Implementation, and Applications”. ACM TOPLAS 28(3), May 2006.
  • [DRM 1996] Andrew Shalit, “The Dylan Reference Manual”, Addison Wesley 1996.
  • [Dujardin 1998] Eric Dujardin, Eric Amiel, and Eric Simon, “Fast Algorithms for Compressed Multimethod Dispatch table generation”. ACM TOPLAS 20(1), January 1998.
  • [Kiczales 1991] Gregor Kiczales, Jim des Rivieres, and Daniel G Bobrow, “The Art of the Meta-Object Protocol”, MIT Press 1991.
  • [Smith 2003] Julian Smith, “Adding multimethods to C++”. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1529.html
  • [van Rossum 2005] Guido van Rossum, “Five-minute multimethods in Python”. http://www.artima.com/weblogs/viewpost.jsp?thread=101605
 
proposals/generic_functions.txt · Last modified: 2007/08/14 18:59 by lars
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki