Table of Contents

Parallel EcmaScript (River Trail) API

Introduction

This proposal describes a gentle extension to EcmaScript that will enable access to the parallelism found in all modern processors.

Goals

The goal of this proposal is to enable data-parallelism in web applications. Browser applications and in particular EcmaScript often need to leverage all available computing resources to provide the best possible user experience. Today web applications do not take full advantage of parallel client hardware due to the lack of appropriate programming models. This proposal puts the parallel computing power of client’s hardware into the hands of the web developer while staying within the safe and secure boundaries of the familiar EcmaScript programming paradigm. It gently extends EcmaScript with simple deterministic data-parallel constructs that enable runtime translation to a low-level hardware abstraction layer. By leveraging multiple CPU cores and vector instructions, River Trail achieves significant speedup over sequential EcmaScript.

Proposal

Design approach

Three Pillar Approach

The design of RiverTrail is based on three pillars:

  • A type called ParallelArray that holds data values
  • Several prototypical methods of ParallelArray that implement parallel constructs like map
  • The concept of an elemental function which is passed to the parallel constructs and typically returns a single data element.

We have chosen a set of parallel constructs that we feel is minimal and upon which other data parallel constructs can be built. For example sum would be implemented using the reduce construct while prefix sum would be implemented using the scan construct. We anticipate useful libraries and infrastructure being built upon these constructs. This approach enables a “do few things well” implementation strategy while ensuring the composability needed to build other abstractions.

ParallelArray Data Structure

We add ParallelArray, a new data type, to EcmaScript. ParallelArray is a read only array-like data structure that is created by a call to a constructor or is returned from a call to one of the ParallelArray prototype methods. In addition to normal constructors we also support comprehensions where a ParallelArray object can be created by specifying an iteration space and providing a function that maps indices to values. We model immutability within EcmaScript’s existing semantics by declaring ParallelArray objects to be frozen. By making ParallelArray immutable we can guard against race conditions. Another important difference compared to existing EcmaScript arrays is the absence of holes. Our proposal guarantees, by construction of the API, that all elements within the bounds of a ParallelArray are defined. Not that this may mean they carry the value undefined.

Elemental Functions

Similar in spirit to the use of kernel functions and callback functions our approach makes use of elemental functions written in EcmaScript. Any EcmaScript function can be used as long as it has an appropriate signature and is side effect free, i.e., the function does not mutate global state. Violations of the latter property will result in an exception.

The system will determine if the elemental function can be optimized to take advantage of data parallel hardware. If optimization is not possible the function is simply executed sequentially. Providing an elemental function that can be optimized is the responsibility of the developer though there are a few hints that will serve the programmer well. Recursion is an example of a typical EcmaScript construct that is likely to defeat optimization. Sequences of arithmetic operations on ParallelArrays that hold homogeneous data, e.g. ParallelArray objects that contain only floating point numbers, are good candidates for optimization.

Prototype Methods

ParallelArray comes with the following four data parallel methods that map ParallelArrays to ParallelArrays: map, scan, filter, and scatter. When combined with elemental functions each of these methods creates a freshly minted ParallelArray. ParallelArray also includes a sixth data parallel method, reduce, which maps a ParallelArray to a single value. We believe that with these five methods one can create a very robust and complete data parallel library that covers a large number of applications. Since ParallelArrays are immutable we do not provide destructive array methods such as push, pop, shift, or unshift.

Example

This simple example creates a three element ParallelArray myPA using new. It then uses the prototype method map and the elemental function function(element){return element+1;} to create a freshly minted ParallelArray myPlusPA with each element in myPA incremented by 1.

myPA = new ParallelArray([1, 2, 3]);				       // <1, 2, 3>
myPlusPA = myPA.map(function(element){return element+1;});	       // <2, 3, 4>

API

ParallelArray

The ParallelArray prototype is the central data structure around which River Trail programs are built. A ParallelArray data structure can be constructed using the following three constructor forms:

Synopsis

ParallelArray();

No arguments: return an empty ParallelArray

ParallelArray(anArray); 

Argument 0 is array-like and the only argument: Use the values in the array-like argument to populate the new ParallelArray. Array-like is defined as having a length attribute and having enumerable properties from 0 to length -1. If elements are missing, the undefined value is used. Typed arrays are considered array-like. Note that only the outer most array-like structure is transformed into a ParallelArray object. Nested elements are referenced as is.

ParallelArray(size, elementalFunction); 

Argument 1 is an instance of a function (the elemental function), argument 0 is the size of the resulting array: Return a ParallelArray of size size where each value is the result of calling the elemental function with the index where its result is placed. If size is a number, then the index passed to the elemental function will be a number, as well. To support the construction of multi-dimensional arrays, size furthermore can be a one dimensional array. In such case, the elemental function will take multiple index arguments up to the length of the array size.

Returns

A freshly minted ParallelArray.

Throws

  • TypeError when the constructor is invoked with two or more arguments but elementalFunction is not a function.
  • TypeError when the constructor is invoked with two or more arguments but the first argument cannot be interpreted as a number or a vector of numbers (not including +/-Infinity and NaN).
  • RangeError when the constructor is invoked with two or more arguments but the first argument is a number or vector of numbers outside the range [0,2^31].

map

Synopsis

myArray.map(depth, elementalFunction)

Arguments

  • depth (optional) the number of dimensions to iterate over. Defaults to 1.
  • elementalFunction described below

Elemental Function

function (element, index1, ..., indexN, source) 
  • element The element from the ParallelArray.
  • index1 ... indexN The indices in source where element is located as well as where the result will be placed. These are scalars and refer to dimensions 1 to depth.
  • source The ParallelArray holding the elements.

The result of the function will be placed in map’s result at index.

Returns

A freshly minted ParallelArray. Elements are the results of applying the elemental function to the elements in the original ParallelArray.

Throws

  • TypeError when elementalFunction is not a function.
  • RangeError when depth is larger than the number of dimensions in the source myArray.
  • Error when elementalFunction has side-effects.

Discussion

One functionally correct implementation of one-dimensional map that leverages the ParallelArray constructor might be

 
function map(_f) { 
    return new ParallelArray(this.shape[0], 
                             function (_i) { 
                                 return f(this[_i], _i, this);}
                            ); }

Example: an identity function

result = pa.map(function(val){return val;});

reduce

Synopsis

myArray.reduce(elementalFunction)

Arguments

elementalFunction described below

Elemental Function

function (a, b)
  • a, b Arguments to be reduced and returned

Returns

The final value, if the ParallelArray has only 1 element then that element is returned.

Throws

  • TypeError when elementalFunction is not a function.
  • RangeError if the source ParallelArray object is empty.
  • Error when elementalFunction has side-effects.

Discussion

Reduce is free to group calls to the elemental function and reorder the calls. For an elemental function that is associative and commutative the final result will be the same as reducing from left to right. Modular addition of integers is an example of an associative and commutative function and in this case the sum of a ParallelArray will always be the same regardless of the order that reduce calls the addition operator. On the other hand, averaging is an example of a non-associative function. The expression Average(Average(2, 3), 9) produces the value 5 2/3 while the expression Average(2, Average(3, 9)) produces the value 4. reduce is permitted to chose whichever call ordering it finds convenient.

reduce is only required to return a result consistent with some call ordering and is not required to chose the same call ordering on subsequent calls. Furthermore, reduce does not magically resolve problems related to overflow and the well documented fact that some floating point numbers are not represented exactly in EcmaScript and the underlying hardware so floating point addition and multiplication are not truly associative.

reduce also requires the elemental function be commutative since reduce can induce reordering of the arguments passed to the elemental functions.

Typically the programmer will only call reduce with associative and commutative functions but there is nothing preventing them doing otherwise. Calling reduce with a non-associative and/or non-commutative function will lead to a result that is guaranteed only to be consistent with some ordering of applying the elemental function on some ordering of the arguments.

scan

Synopsis

myArray.scan(elementalFunction)

Arguments

elementalFunction described below

Elemental Function

function (a, b) 
  • a, b - arguments to be reduced and result returned

Returns

A freshly minted ParallelArray whose i-th element is the result of using the elemental function to reduce the elements between 0 and i, inclusively, in the original ParallelArray.

Throws

  • TypeError when elementalFunction is not a function.
  • Error when elementalFunction has side-effects.

Example: an identity function

pa.scan(function(a, b){return b;})

Discussion

The construct implements what is known as an inclusive scan which means that the value of the i-th result is the same as what would be produced by [0..i].reduce(elementalFunction). Notice that the first element of the result is the same as the first element in the original ParallelArray. An exclusive scan can be implemented by shifting right by one the results of an inclusive scan, dropping the rightmost value, and inserting the identity at location 0. Similar to reduce, scan can reorder the calls to the elemental functions. Ignoring overflow and floating point anomalies, this cannot be detected if the elemental function is associative and commutative, in which case using an elemental function such as addition to create a partial sum will produce the same result regardless of the order in which the elemental function is called. However using a non-associative or non-commutative function can produce different results due to the ordering that scan calls the elemental function. While scan will produce a result consistent with a legal ordering the ordering and the result may differ for each call to scan.

Typically the programmer will only call scan with associative and commutative functions but there is nothing preventing them doing otherwise. Calling scan with a non-associative and/or non-commutative function will lead to a result that is guaranteed only to be consistent with some ordering of applying the elemental function.

scatter

Synopsis

myArray.scatter(indices, defaultValue, conflictFunction, length)

Arguments

  • indices array of indices in the resulting array
  • defaultValue optional argument indicating the value of elements not set by scatter. When not present, the default value is undefined
  • conflictFunction optional function to resolve conflicts, details below.
  • length optional argument indicating the length of the resulting array. If absent, the length is the same as the length of the original ParallelArray.

Returns

A freshly minted ParallelArray A where each element A[i] is defined as

  1. A[indices[i]] = myArray[i], when indices[i] appears only once in indices
  2. A[indices[i]] = conflictFunction(valA, valB) when multiple elements are scattered to the same location. See below
  3. defaultValue, when i is not present in indices array

Example: an identity function

result = pa.scatter(indices); where indices is a ParallelArray in which element === index

Handling conflicts with the Conflict Function

A conflict occurs when multiple elements are scattered to the same location. It results in a call to conflictFunction, which is an optional third argument to scatter. If conflictFunction is undefined, scatter throws an exception when a conflict occurs. Note that the order in which conflicts are resolved is left unspecified. Therefore, to ensure deterministic behavior, the conflict functions needs to be deterministic and associative.

Synopsis

conflictFunction(valA, valB)

Arguments

  • valA, valB the two values that conflict

Returns

Value to place in result[indices[index]]

Throws

  • RangeError when the length of indices does not match the length argument or, if length is not given, the length of the source ParallelArray.
  • TypeError when conflictFunction is neither undefined nor a function.
  • RangeError when a conflict occurs but no conflict function has been supplied by the programmer.
  • TypeError when indices contains a value that cannot be interpreted as a number, e.g., +-Infinity and NaN.
  • RangeError when indices contains an index smaller 0 or greater than the result array’s length.

Example: Resolve conflict with the larger value

function chooseMax(valA, valB){
      return (valA>valB)?valA:valB;
 }
pa = new ParallelArray([0,1,2,3,4,5]);
result = pa.scatter([0,3,1,4,2,5]);  		    // <0,2,4,1,3,5>
result2 = pa.scatter([0,0,1,1,2,2], chooseMax);     // <1,3,5,undefined,undefined, undefined>
result3 = pa.scatter([0,0,1,1,2,2], chooseMax, 3);  // <1,3,4>

filter

Synopsis

myArray.filter(elementalFunction)

Arguments

elementalFunction described below.

Elemental Function

function (element, index, source) 
  • element The element from the ParallelArray.
  • index The index in source where element is located.
  • source The ParallelArray holding the elements.

If the result of the function is truthy then the corresponding element will be placed in filter‘s result. Elements in the result will be in the same order as in the source.

Returns

A freshly minted ParallelArray holding the source elements located at myArray[i] for which elementalFunction returns a truthy value. The order of the elements in the returned ParallelArray is the same as the order of the elements in the source ParallelArray.

Throws

  • TypeError when elementalFunction is not a function.
  • Error when elementalFunction has side-effects.

Examples

identity function
var pa = new ParallelArray([1,2,3,4,5,6,7]); 
var result = pa.filter(function(ignore){return true;});
filter out every other element
var pa = new ParallelArray([1,2,3,4,5,6,7]); 
var result = pa.filter(function(element, index) { return index%2?false:true;} );

flatten

Synopsis

myArray.flatten()

Arguments

none

throws

  • RangeError when myArray is one dimensional.

Returns

A freshly minted ParallelArray whose outermost two dimensions have been collapsed into one.

Example

pa = new ParallelArray([[1,2][3,4]])   // <<1,2>,<3,4>>
pa.flatten()                           // <1,2,3,4>
pa3D = new ParallelArray(([[[1,2][3,4]],[[11,12][13,14]],[[11,22][23,24]]]);
		  // <<<1,2>,<3,4>>, <<11,12>,<13,14>>, <<11,22>,<23,24>>>
pa2D = pa3D.flatten(); // <<1,2>,<3,4>,<11,12>,<13,14>,<11,22>,<23,24>>
pa1D = pa2D.flatten(); // <1,2,3,4,11,12,13,14,11,22,23,24>

partition

Synopsis

myArray.partition(size)

Arguments

size the size of each element of the newly created dimension; the outermost dimension of myArray needs to be divisible by size.

Returns

A freshly minted ParallelArray where the outermost dimension has been partitioned into elements of size size.

Example

pa = new ParallelArray([1,2,3,4])  // <1,2,3,4>
pa.partition(2)                    // <<1,2>,<3,4>>

Discussion

While one could implement both flatten and partition using the other constructs we call them out here to make it easy for the compiler to recognize flatten or partition and make optimizations easier.

Throws

  • RangeError, when outermost dimension is not divisible by size.

Synopsis

myArray[index];

Arguments

index a value representing a valid index in the outermost dimension of myArray.

Returns

The value found at index or undefined if no such value exists. The lookup operation only traverses the prototype chain if index is not an integer value. Otherwise, [] returns undefined for integer indices that are out of bounds. If myArray is a multi-dimensional ParallelArray object, [] returns a ParallelArray object that represents a slice of the source ParallelArray object at index index. All ParallelArray methods can be applied to such slice. In particular, [] can be used to further slice the ParallelArray or ultimately select single elements.

Example

pa = new ParallelArray([[0,1,2,3,4], [10,11,12,13,14], [20,21,22,23,24]])

pa.[1][1];                   // 11
pa[1];                       // <10,11,12,13,14>

Discussion

Since ParallelArrays are immutable using [] as part of the left hand side of an assignment is not allowed and results in a throw.

get

Synopsis

myArray.get(indices);

Arguments

indices: an array of number values that represent valid indices into myArray. The first index references the outer most dimension, the second index references the next dimension and so forth. The same lookup rules as for [] apply.

Returns

The value found at the indices or undefined if no such value exists.

Throws

  • TypeError if indices is not an array like object.
  • RangeError if the length of indices is larger than the number of dimensions in the source array.

Example

pa = new ParallelArray([0,1,2,3,4], [10,11,12,13,14], [20,21,22,23,24])

pa.get([1,1]);                    // 11 same as pa[1][1].
pa.get([1]);                      // <10,11,12,13,14>, same as pa[1].

length

Synopsis

myArray.length

Returns

The top-level (first dimension) length of the ParallelArray. Similar to typed arrays, the length property is a non-configurable getter on the prototype and as such not an own property of the ParallelArray object itself.

Example

pa = new ParallelArray([1,2,3,4])   	// <1,2,3,4>
pa.length                        	// 4

shape

Synopsis

myArray.shape

Returns

An Array containing the length of each dimension of the ParallelArray starting with the outermost dimension. Like the length property, the shape property is a non-configurable getter on the prototype and as such not an own property of the ParallelArray object itself. Also, each invocation of shape returns a new copy of the ParallelArray object’s shape vector.

Discussion

pa.shape.length gives the dimensionality of the parallel array.

Example

pa = ParallelArray([[1,2,3],[4,5,6]])   	// <<1,2,3>,<4,5,6>>
pa.shape                        		// [2, 3]
pa.shape === pa.shape                           // false

toString

Synopsis

myArray.toString()

Arguments

none

Returns

A string representation of the ParallelArray object. We use < and > as delimiters, except for empty arrays, for which the empty String is returned.

Support for Binary Data

Rationale

Some interfaces, in particular WebGL and canvas, require the use of specific binary data formats like single precision float or 8 bit Integer values. In line with JavaScript’s native Array type, ParallelArray objects store their elements in JavaScript’s own representation, e.g., 64 bit floating point values for numbers. Explicitly converting ParallelArray objects after creation imposes significant overheads, which, in the worst case, may annihilate gains through parallel execution. To reduce conversion overheads and simplify interfacing with interfaces like WebGL, we propose a means to influence the underlying data store of a ParallelArray object and thus creating the data in the required format.

Our proposal is a true superset to the API described above, i.e., we only add new optional parameters and an extra method and property. We use the storage type description language developed as part of the binary_data proposal. However, the exact details of such language is not of importance, as long as the specification language

  • is sufficient to describe element types and

corresponding array container types as required by the above mentioned

  interfaces
* supports fixed size data types, in particular arrays
* provides a means to describe ordinary ECMAScript values

API Changes

ParallelArray

Synopsis

ParallelArray(size, elementalFunction, typeSpec);

Argument 2 is an optional description of the underlying data store’s type: The ParallelArray object is created as specified above. However, before writing the elements to the created ParallelArray object, they are coerced to the type described by typeSpec. The coercion rules defined in binary_data apply.

Returns

A freshly minted ParallelArray objects whose elements have been coerced to typeSpec.

Throws

May throw a conversion exception of the semantics of binary data require so.

Discussion

If typeSpec is omitted, its value defaults to Any (the type descriptor used for ECMAScript values, see binary_data_semantics),

The created ParallelArray object uses a data storage format that complies with the type description ArrayType(typeSpec, size) if size is a scalar value or ArrayType(...(ArrayType(typeSpec, size_N), ...), size_0) if size is a vector of the form [size_0, ..., size_1].

map

Synopsis

myArray.map(depth, elementalFunction, typeSpec)

Arguments

  • depth (optional) the number of dimensions to iterate over (as before)
  • elementalFunction function to compute elements (as before)
  • typeSpec specification of the underlying storage format

Returns

A freshly minted ParallelArray. Elements are the results of applying the elemental function after conversion using the type specification given by typeSpec.

Throws

Additionally to the exceptions thrown before, map may also throw a conversion exception if the binary data semantics require so.

Discussion

If typeSpec is omitted, its value defaults to Any (the type descriptor used for ECMAScript values, see binary_data_semantics),

The created ParallelArray object uses a data storage format that complies with the type description ArrayType(typeSpec, shape_0) if depth is omitted or ArrayType(...(ArrayType(typeSpec, shape_N-1), ...), shape_0) if depth has the value N. The vector [shape_0, ..., shape_N-1] is a prefix of the shape of the source array myArray.

scan

Synopsis

myArray.scan(elementalFunction, typeSpec)

Arguments

  • elementalFunction unchanged
  • typeSpec specification of the underlying storage format

Returns

A freshly minted ParallelArray whose i-th element is the result of using the elemental function to reduce the elements between 0 and i, inclusively, in the original ParallelArray. Before writing to the underlying storage, each element of the resulting ParallelArray is coerced according to the type specification given by typeSpec.

Throws

Additionally to the exceptions thrown before, scatter may also throw a conversion exception if the binary data semantics require so.

Discussion

It is important to note here that data is only coerced once when written to the final result object. In-flight data, in particular arguments to the elemental function, are passed unconverted.

If typeSpec is omitted, its value defaults to Any (the type descriptor used for ECMAScript values, see binary_data_semantics),

The created ParallelArray object uses a data storage format that complies with the type description ArrayType(typeSpec, length) where length is the value of the length property of the source array myArray.

scatter

Synopsis

myArray.scatter(indices, defaultValue, conflictFunction, length, typeSpec)

Arguments

  • indices array of indices in the resulting array
  • defaultValue optional argument indicating the value of elements not set by scatter (as before)
  • conflictFunction optional function to resolve conflicts (as before, see discussion)
  • length optional argument indicating the length of the resulting array (as before)
  • typeSpec specification of the underlying storage format

Returns

A freshly minted ParallelArray A where each element A[i] is defined as

  1. A[indices[i]] = typeSpec(myArray[i]), when indices[i] appears only once in indices
  2. A[indices[i]] = conflictFunction(typeSpec(valA), typeSpec(valB)) when multiple elements are scattered to the same location. See below
  3. typeSpec(defaultValue), when i is not present in indices array

where typeSpec(val) denotes the conversion of val according to the type specification provided by typeSpec.

Throws

Additionally to the exceptions thrown before, scan may also throw a conversion exception if the binary data semantics require so.

Discussion

It is important to note here that for scatter, the arguments to the conflict resolution function are the already coerced values. The rationale is two-fold:

  • Conceptually, the conflict resolution function resolves conflicting

stores to the resulting ParallelArray object. As only coerced values are

  stored, these are also the parties in the conflict.
* //Practically//, this semantics enables a simpler and more efficient
  implementation, where results may be written to the final result buffer
  even though a conflict may still arise.  

If typeSpec is omitted, its value defaults to Any (the type descriptor used for ECMAScript values, see binary_data_semantics),

The created ParallelArray object uses a data storage format that complies with the type description ArrayType(typeSpec, length) where length is the value of the length argument or, if that is omitted, the value of the length property of the source array myArray.

Omitting a default value while using a type specification may lead to coercion errors if the default unknown cannot be coerced to the specified type.

filter

Synopsis

myArray.filter(elementalFunction, typeSpec)

Arguments

  • elementalFunction decision function (as before)
  • typeSpec specification of the underlying storage format

Returns

A freshly minted ParallelArray holding the source elements located at myArray[i], for which elementalFunction returns a truthy value, coerced to the storage format described by typeSpec. The order of the elements in the returned ParallelArray is the same as the order of the elements in the source ParallelArray.

Throws

Additionally to the exceptions thrown before, filter may also throw a conversion exception if the binary data semantics require so.

Discussion

If typeSpec is omitted, its value defaults to Any (the type descriptor used for ECMAScript values, see binary_data_semantics),

The created ParallelArray object uses a data storage format that complies with the type description ArrayType(typeSpec, length) where length is the value of the length property of the result array.

flatten

Synopsis

myArray.flatten()

Arguments

none

Returns

A freshly minted ParallelArray whose outermost two dimensions have been collapsed into one and where the outermost two ArrayType constructors of the descriptor for the underlying storage format have been combined.

partition

Synopsis

myArray.partition(size)

Arguments

size the size of each element of the newly created dimension; the outermost dimension of myArray needs to be divisible by size.

Returns

A freshly minted ParallelArray where the outermost dimension has been partitioned into elements of size size. The type descriptor of the underlying storage layout is updated accordingly.

API additions

type

Synopsis

myArray.type

Returns

The type descriptor of the backing store of the entire ParallelArray.

Example

var myArray = new ParallelArray([10,4], function (i,j) { return [i,j]; });
myArray.type // yields ArrayType(ArrayType, Any, 4), 10)
var myArray = new ParallelArray([10,4], function (i,j) { return [i,j]; }, ArrayType(uint8, 2));
myArray.type // yields ArrayType(ArrayType, ArrayType(uint8, 2), 4), 10)

asBinary

Synopsis

myArray.asBinary(typeSpec)

Arguments

  • typeSpec type description of the requested storage type for the resulting object including the array structure of the ParallelArray object itself.

If not present, the type as returned by the type attribute is used.

Returns

Returns an immutable view of the data held by the source ParallelArray object coerced to the type specification provided by typeSpec. If no type specification is specified, the internal representation as returned by the \\type
property is used. The method should complete in constant time when the internal representation is used, either explicitly or implicitly.

 
strawman/data_parallelism.txt · Last modified: 2013/02/26 21:25 by masterofhats
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki