Binary data: discussion

This page contains discussion of the binary data spec, including possible extensions, design rationale, and references.

Extensions

Large integers

Some operations produce or require integers larger than the ECMAScript number type can represent. The following two types are a simple object type that encapsulates 64-bit integers, both unsigned and signed.

Unsigned 64-bit integers

new UInt64(n : String | Number | Int64 | UInt64 = 0) -> UInt64

Semantics
    If n is a string
        Let V ?= ParseInt(n)
        If V not in [0, 264) Throw TypeError
        Let W = a new UInt64 object with W.[[Value]] = V.
        Return W
    If n is a number and n in [0, 264)
        Let W = a new UInt64 object with W.[[Value]] = n
        Return W
    If n is an Int64 object and n.[[Value]] in [0, 264)
        Let W = a new UInt64 object with W.[[Value]] = n.[[Value]]
        Return W
    If n is a UInt64 object
        Let W = a new UInt64 object with W.[[Value]] = n.[[Value]]
        Return W
    Throw TypeError

UInt64.lo(n : UInt64) -> UInt32

Returns the low-order 32-bit value of n.[[Value]].

UInt64.hi(n : UInt64) -> UInt32

Returns the high-order 32-bit value of n.[[Value]].

UInt64.join(hi : Number | Int64 | UInt64, lo : Number | Int64 | UInt64) -> UInt64

Returns a new UInt64 whose [[Value]] is computed by joining the numeric value of hi as the high-order 32 bits and the numeric value of lo as the low-order 32 bits.

UInt64.compare(a : UInt64, b : UInt64) -> -1 | 0 | 1

If a and b are both UInt64 objects, returns -1 if a.[[Value]] < b.[[Value]], 0 if a.[[Value]] = b.[[Value]], and 1 if a.[[Value]] > b.[[Value]]. Otherwise throws a TypeError.

UInt64.prototype.toString(radix : 2 | 10 | 16 = 10) -> String

Returns a string representation of this.[[Value]] in base radix, consisting of one or more lowercase digits of base radix.

Signed 64-bit integers

new Int64(n : String | Number | Int64 | UInt64 = 0) -> UInt64

Semantics
    If n is a string
        Let V ?= ParseInt(n)
        If V not in [-263, 263) Throw TypeError
        Let W = a new Int64 object with W.[[Value]] = V.
        Return W
    If n is an integer and n in [-263, 263)
        Let W = a new Int64 object with W.[[Value]] = n
        Return W
    If n is a UInt64 object and n.[[Value]] in [-263, 263)
        Let W = a new Int64 object with W.[[Value]] = n.[[Value]]
        Return W
    If n is an Int64 object
        Let W = a new Int64 object with W.[[Value]] = n.[[Value]]
        Return W
    Throw TypeError

Int64.lo(n : Int64) -> UInt32

Returns the low-order 32-bit value of n.[[Value]].

Int64.hi(n : Int64) -> Int32

Returns the high-order 32-bit value of n.[[Value]].

Int64.join(hi : Number | Int64 | UInt64, lo : Number | Int64 | UInt64) -> Int64

Returns a new Int64 whose [[Value]] is computed by joining the numeric value of hi as the high-order 32 bits and the numeric value of lo as the low-order 32 bits.

Int64.compare(a : UInt64, b : UInt64) -> -1 | 0 | 1

If a and b are both Int64 objects, returns -1 if a.[[Value]] < b.[[Value]], 0 if a.[[Value]] = b.[[Value]], and 1 if a.[[Value]] > b.[[Value]]. Otherwise throws a TypeError.

Int64.prototype.toString(radix : 2 | 10 | 16 = 10) -> String

Returns a string representation of this.[[Value]] in base radix, consisting of a possible leading minus sign followed by one or more lowercase digits of base radix.

Typed Arrays alignment

Typed Arrays exist in several browsers and support a number of the core scenarios covered also by Binary Data. The Binary Data proposal can be modified to be a compatible augmentation of Typed Arrays, so that Binary Data can be used with existing Web APIs which expose ArrayBuffer objects.

Array Objects would be extended to also have a buffer property returning an underlying ArrayBuffer object, a byteLength property returning the length in bytes, and a byteOffset property returning the offset within the buffer where the array begins. Array Type constructors would additionally accept an ArrayBuffer to create an array over the existing ArrayBuffer.

To be fully source compatible with existing uses of typed arrays, the set of concrete array objects defined in Typed Arrays can additionally be added to the proposal.

Details of the Typed Array API here.

Rationale

Aliasing

Block objects encapsulate references to blocks of mutable data in the program store. These references can be shared and aliased. In other words, block objects provide a “reference semantics” for binary data.

The member accessors of array and struct blocks allow the creation of new block objects with references to shared data. As a result, it is possible for multiple objects to refer to the same store location. Thus, the same reference may be pointed to by two block objects that are distinguished by the strict-equality (===) operator.

An alternative would be to require memoization of block objects, so that any reference is the root reference of at most one block object. However, this could be difficult to implement (since a reference may always be a nested part of a larger block in the heap), and it does not eliminate aliasing (since, again, struct and array accessors allow sharing references to nested sub-blocks).

Numeric blocks

The numeric types are meant to simulate datatypes with non-reference semantics from languages like C. There are several potential approaches to providing this functionality:

  1. provide a reference semantics for numeric types via block objects
  2. introduce new non-reference primitive datatypes (aka value types)
  3. memoize block objects to simulate value semantics with objects

This spec takes a simpler approach, simply eliminating immediate access to numeric blocks as ECMAScript values.

Consider Python’s struct library, or js-ctypes. In both, it is possible to construct a first-class numeric block, which is given a separate object identity and heap-allocated cell containing a number. It is not clear that this provides much utility, since the same can easily be achieved with a one-element array type or one-field struct. Moreover, it leads to confusing properties such as:

(new uint32(42)) == (new uint32(42)) // false

We could attempt to patch around this problem by extending the semantics of == (but not ===, since that must always distinguish two distinct objects), but it seems more consistent simply to avoid simulating new primitive datatypes with reference-typed objects.

Large integers

The Int64 and UInt64 types are the simplest realization of 64-bit integers possible, but they are not ideal. It would likely be better to add language support for bignums. In the interest of keeping this spec orthogonal, we have used Int64 and UInt64.

Struct-type constructor API

In the js-ctypes API, struct types take as their argument an array of field descriptors expected to have exactly one own-property:

new StructType([{ x: uint32 }, { y: uint32 }])

An even more convenient form would be to allow the use of a single object literal, using the (admittedly subtle!) enumeration order of the properties to determine the order of the fields in the struct layout:

new StructType({ x: uint32, y: uint32 })

Because struct types are meant to be compatible with actual I/O, where the order of struct fields is significant, it must be easy to guarantee the order of fields. For this reason, the explicit order of arrays makes it easier to reason about the order of fields.

It might however, be good to provide a hybrid interface to the StructType constructor to allow both the convenient API and the more explicitly-ordered API. It is not entirely clear how to distinguish the two different types of input, however, except possibly to offer two different API‘s, e.g.:

new StructType({ x: uint32, y: uint32 })
StructType.create([ [ "x", uint32 ], [ "y", uint32 ] ])

Deviations from js-ctypes

  • ConvertToJS isn’t an appropriate name; renamed to Reify
  • renamed CData to Block and CType to BlockType
  • no numeric CData, to avoid treating “value types” as reference types
  • numeric CTypes are only cast functions, not block-constructors
  • ImplicitConvert is just called Convert
  • ExplicitConvert is called Cast and only works on number block types
  • compound CTypes are only block-constructors, not cast functions

To do

  • API to produce struct descriptor
  • alternative construction forms for struct-types (object-literal convenience vs. array of arrays) and structs (positional vs object?)
  • move update into BlockType.prototype?
  • chars and string conversion
  • note non-configurable and non-writeable attributes throughout

References

 
harmony/binary_data_discussion.txt · Last modified: 2011/05/18 06:32 by lukeh
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki