Thoughts on Specifying Garbage Collection Semantics

In the absence of weak references, garbage collection decisions are not observable and so need not have a precise semantics. But programmers code to expectations of how garbage collectors work, and expect implementers to live up to these expectations. Further, some proposals such as weak references, weak maps, or proper tail calls are garbage collection aware. Their purpose can only be understood in terms of their interaction with the garbage collector. Without stating our baseline GC expectations, it becomes impossible to specify these other proposals. As usual, we divide specifications into safety (or integrity or consistency) concerns, liveness (or availability or progress) concerns, and confidentiality concerns.

Safety, Integrity, Consistency

The safety constraint is that the garbage collector MUST NOT collect any storage that then becomes needed to continue correct execution of the program. Should weak references be added, then garbage collection decisions become observable. Based on a suggestion from Cameron McCormack, we state the safety constraint as follows: So long as operational semantics of the remainder of the program execution includes the possibility that a reference to an object X may be dereferenced, then X MUST NOT be collected. Thus, the garbage collector is allowed to ignore any references that remain present in the semantic state, but which it can ascertain will never be dereferenced in any possible execution. This sets an upper bound on what state MAY be collected. Put another way, if the garbage collector ever reports that X has been collected, such as by nullifying a weak reference to X, if operational semantics of remaining execution requires the traversal of a strong (non-weak) reference to X, then the previous report demonstrates a safety violation.

Liveness, Availability, Progress

A garbage collector that never collects is safe but not very useful. Programmers need some expectation of what storage implementers will consider garbage, so that they can write long running programs that occupy limited memory. Where safety sets an upper bound on what may be collected, these expectations form approximately a lower bound. However, there is no spec of how much actual memory any individual object occupies, nor is there likely to be. Thus we never have any guarantee when any program may exhaust its actual raw memory allotment, so all lower bound expectations are not precisely observable. Therefore, all lower bounds on collection are stated here as SHOULDs, not MUSTs. Of course, if our stated lower bounds ever exceed our upper bounds, that is an error in this specification.

Confidentiality

We place no further constraints on when collections happen, how long they take to execute, or how data dependent these timings may be. Therefore, to code that can sense the duration of, for example, an allocation which may trigger a collection, we place no bounds on the bits that may leak over the resulting non-overt channel (whether a side channel or a covert channel). For code that can overtly sense what has been collected, such as by using weak references, such observation opens an overt information leakage channel, so the ability to create weak references must only be made available to code allowed to sense such information. Further, the difference between the lower and upper bounds above implies another non-overt channel, even for code that cannot sense the duration of time.

Baseline Semantics

Our starting point must be the expected garbage collection semantics of normal programs in the absence of special garbage collection aware primitives. An operational semantics defines semantic state and operations that derive a next state from a previous state. We identify within this state the root references as those which an enabled operation may directly dereference. Root references are necessarily reachable. We then define strong reachability from references to objects, and from objects to references. We can then state the lower bound expectation as All objects which are not transitively strongly reachable from roots SHOULD eventually be collected, if needed to prevent the program execution from failing due to memory exhaustion. Our safety requirements allow some reachable objects to be collected as well, so long as the garbage collector can ascertain that they will never be reached. Since a reference which is presently not reachable from roots can never become reachable, and since an unreachable reference can never be dereferenced, our lower bound is within out upper bound.

Of all semantic state, that not collected is retained. The upper (safety) bound on what MAY be collected implies a lower bound on what MUST be retained. The lower (liveness) bound on what SHOULD be collected implies an upper bound on what is expected to be retained. The asymptotic space complexity of an algorithm is to be calculated in terms of this expected upper bound; but requires an additional theory about the space complexity bounds on the individual objects occupying this storage.

We leave it to the various GC-aware proposals, such as weak references, weak maps, or proper tail calls, to specify how they would elaborate on this starting point.

References

 
strawman/gc_semantics.txt · Last modified: 2011/11/12 23:17 by markm
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki