Monday, September 28, 2009

Bounds Checking for C

Approaches to bounds checking

One response to this analysis is to discard C, since this lack of efficient checkability is responsible for many software failures.

A second approach is to extend the language to make checking easier. There are various proposals for doing this, and it is an opportunity to add other features such as assertion checking.

A third more-or-less workable scheme is to modify the representation of pointers to include three items: the pointer itself, and the lower and upper bounds of the object to which it is supposed to point. Experience with this has shown the benefits of bounds checking (e.g. see the bcc and rtcc compilers cited below), but there are difficulties:

  • Although some optimisation is possible, execution time of the resulting code increases by a large factor (ten or more, apparently).

    Even if the checking code can be optimised away, there remains the cost of passing triples for every pointer - which essentially prevents their being allocated to registers.

  • Because the representation of pointers has been changed, checked code is incompatible with normal code. This means that special versions of all libraries and system calls must be provided, and all the constituent modules of a program must be run with checking on. This adds to the performance problem.

    Some automatic support for interfacing checked code with normal code can be given, but this only works for straightforward cases. GUI code with call-backs, for example, is tricky.

  • Code which interfaces to hardware (e.g. a DMA controller) requires special attention since the hardware must be presented with standard addresses.

How we solved the problem

Our technique provides full checking without changing the representation of pointers. We therefore avoid most of the problems noted above. Some efficiency problems remain, but bounds checking need not be used in all of the files which make up a program, so trusted, performance-critical code can run at full speed.

The key idea is this:

  • Every pointer expression derives a new pointer from a unique original pointer.

    For example, in "p+2*k+1" we derive a new pointer from "p".

    By contrast, in "p+q" or "p-q", we derive an integer from two pointers. The integer is nonsense as a pointer.

    We call this unique original pointer the expression's "base" pointer.

  • Every pointer value is valid for just one allocated storage region.

    An allocated storage region may be a global, static, automatic or heap-allocated variable, structure or array.

  • We can check whether a pointer arithmetic expression is valid by finding its base pointer's storage region, then checking that the expression's result points into the same storage region.

  • If the base pointer appears not to refer to any valid region, then it must refer to a region originating in unchecked code. In this case we cannot check the result of the expression.

  • If the base pointer's storage region is an array, say A[100], then (according to the ANSI standard) it is valid to calculate the address of the element after the last one valid (in this example, the address of A[100]).

    This is so that a pointer can be incremented and then tested for the loop exit condition.

    To prevent false alarms, we pad the storage layout of arrays to that A[100] is a valid pointer (we still check it when it is used).

Implementation

We made some small modifications to the C front-end of gcc, the Gnu C compiler, to add code to check pointer arithmetic and use, and to maintain a table of known allocated storage regions.

We went to some trouble to ensure that gcc's optimiser could handle the added code, and employed modest inlining for efficiency.

The table of known allocated storage regions has to handle insertions, deletions and range lookups extremely fast, but since programs display a high degree of locality the access pattern is highly skewed. For these reasons a splay tree was used, in which objects are migrated to the root when accessed.

Performance

  • nfib (dumb doubly-recursive Fibonacci): no slowdown.
    • Execution time: same.
    • Compile-time: slowdown of 3 (very small)
    • Executable size: much larger due to inclusion of library.
  • Matrix multiply (ikj, using array subscripting):
    • Execution time: slowdown of around 30 compared to unoptimised.
    • Compile-time: slowdown of around 2.
    • Executable size: roughly the same.
Source: www.doc.ic.ac.uk

No comments:

Post a Comment