Binary data

Goals

Provide portable, memory-safe, efficient, and structured access to compact (i.e., contiguously allocated) binary data, as well as an interface for external binary I/O facilities such as XMLHttpRequest, HTML5 File API, and WebGL.

Desiderata:

  • expressive and convenient way to create structured binary data
  • no new primitive (i.e., non-object) ECMAScript values
  • admit architecture-native internal representation while preserving portability:
    • hide struct layout/padding
    • hide endianness
    • prevent multiple interpretations of the same binary data structure at different types
  • convenient conversion to native ECMAScript values
  • reference semantics without changing ECMAScript evaluation model
  • familiar behavior by analogy to C

The design of this library allows implementations to represent allocated binary data in architecture-specific formats – in particular, using the architecture’s native padding/alignment and endianness – without exposing these details to ECMAScript. This allows for efficient implementation while avoiding cross-platform portability hazards.

Examples

const Point2D = new StructType({ x: uint32, y: uint32 });
const Color = new StructType({ r: uint8, g: uint8, b: uint8 });
const Pixel = new StructType({ point: Point2D, color: Color });
 
const Triangle = new ArrayType(Pixel, 3);
 
let t = new Triangle([{ point: { x:  0, y: 0 }, color: { r: 255, g: 255, b: 255 } },
                      { point: { x:  5, y: 5 }, color: { r: 128, g: 0,   b: 0   } },
                      { point: { x: 10, y: 0 }, color: { r: 0,   g: 0,   b: 128 } }]);
...

TODO: more examples

Blocks: compact binary data

This spec introduces an internal datatype called blocks, which intuitively represent contiguously-allocated binary data. Blocks are not themselves ECMAScript values; they live in the program store (i.e., the heap). Blocks can be:

  • numbers of various common fixed-size machine types
  • arrays of fixed length
  • structs of fixed size, with ordered fields

Block types

Every block is associated with a fixed block type, which describes the permanent shape, size, and interpretation of the block, somewhat like a runtime type tag. All references to a given block in the program store are associated with the same block type. Consequently, implementations can allocate blocks as untagged memory buffers (e.g., raw C data structures) without violating memory safety.

Block type objects have a bytes property, which reports the logical size of blocks of that type, in bytes. Note that the bytes property does not expose information about the actual size of a block type, just the logical size of its components. This avoids exposing architecture- and implementation-specific details like struct padding.

Block types also mediate conversion from ECMAScript values to raw block data. This is specified via two internal methods:

  • [[Convert]] converts an ECMAScript value to a block
  • [[Reify]] converts a block to an ECMAScript value

In the semantics, types are compared via an internal [[IsSame]] method. Types are compared similarly to their corresponding C types: numeric and array types are compared structurally, whereas struct types are generative and compared “nominally.” (More on this below.)

Block objects

The spec introduces a new object type called block objects, which encapsulate references to block data as ECMAScript values. Reads and writes to the block data underlying the object are marshalled through the conversions specified by the block types.

Numeric data

Numeric data can be stored in blocks with any of the pre-defined block types:

var uint8, uint16, uint32 : BlockType
var int8, int16, int32 : BlockType
var float32, float64 : BlockType

Each of these types defines [[Reify]] and [[Convert]] internal methods that convert to and from (respectively) ECMAScript values in a straightforward manner. For example, the ECMAScript value 17 converts to/from the uint32 value 17, and the ECMAScript value 300 fails to convert to a uint8 with a TypeError. See binary data semantics for details.

The numeric types can also be called as functions on ECMAScript values. This acts like a C cast, and uses a more permissive casting algorithm, based on the C casting rules.

The numeric types cannot be used as constructors to instantiate block object; using a numeric type with new throws an exception. (Objects have reference semantics, and numeric types should have value semantics.)

See binary data discussion for discussion of 64-bit integer types uint64 and int64.

Arrays

Array block types describe fixed-length sequences of block data of homogeneous block-type. Given a block type object elementType and a non-negative integer length, it is possible to define a new array block-type object t using the ArrayType constructor:

t = new ArrayType(elementType, length)

The [[Convert]] operation converts an array-like ECMAScript value to block data by recursively converting its elements in order.

The [[Reify]] operation creates an array block object.

Given an array block-type object such as t, it is possible to construct new array blocks:

a = new t()
a = new t(val)

Elements of the array are accessible by getting or setting their index.

Structs

Struct block types describe fixed-length sequences of block data of heterogeneous block-types. Given an ECMAScript object fields, it is possible to define a new struct type object t using the StructType constructor:

t = new StructType(fields)

The implementation enumerates the own-properties of fields (in the standard enumeration order) to create the internal struct type descriptor.

The [[Convert]] operation converts an ECMAScript object to block data by reading each of the properties described by the struct type and converting their values.

The [[Reify]] operation creates a struct block object.

Given a struct block-type object such as t, it is possible to construct new struct blocks:

s = new t()

Each of the fields of the struct can be accessed or updated by name.

Pointer types

The API also allows storing pointers, but for security reasons any types that contain pointers must be considered opaque: they cannot expose their underlying buffer, be overlaid with other types, or be overlaid with existing buffers. The two pointer types are ObjectPointer and StringPointer.

let S = new StructType({
    i: int32,
    j: int32,
    o: ObjectPointer,
    s: StringPointer
});
let x = new S();
x.o // null
x.s // null
x.o = document;
x.o === document // true

Struct views

Struct and array blocks are encapsulated by objects. For some high-performance applications, it may be important to avoid the extra allocation of objects to access components of potentially very large block data structures.

For this reason, the spec also exposes a somewhat lower-level operation on struct and array objects, called a struct view. A struct view allows a program to reuse a struct object by updating its view to point to a different block of the same block type. For example, in an array a of structs of type T, a struct view s of type T can be updated to point to subsequent elements of a:

let S = new StructType(...);
let A = new ArrayType(S);
 
let a = new A(1000000);
let p = new StructView(S);
 
for (let i = 0; i < a.length; i++) {
    p.setView(a, i);
    console.log(p.x + ", " + p.y);
}

The setView method can take more than one index or field name to refer to deeply-nested sub-structures:

p.setView(a, i, "foo", "bar");

This convenience avoids the allocation of intermediate struct objects without the need for the program to pre-allocate reference objects as “temporary pointers.”

Typed cursors

Array types have a convenience method for constructing efficient cursors using the iterator protocol:

let S = new StructType({
    foo: new StructType({
        bar: int32,
        /* ... */
    }),
    /* ... */
});
let A = new ArrayType(S);
 
let a = new A(1000000);
let total = 0;
 
for (let bar of a.cursor("foo", "bar")) {
    total += bar;
}

Custom cursors can be efficiently implemented using struct views. This works even on non-array data such as trees.

 
harmony/binary_data.txt · Last modified: 2012/12/07 20:15 by dherman
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki