This paper addresses the problem of high cache misses when using
pointer-based structures. These structures tend to be difficult to handle
with static techniques such as hardware prefetching and loop
transformations. The advantage these structures do have is
locational transparency, where the structure is agnostic to the
memory location of its elements. The paper provides two main techniques,
cache-conscious reorganization and allocation, as well as evaluation of
several options and some theoretical analysis.
Cache-conscious layout takes advantage of locational transparency in order
to create two main effects to improve cache performance. The first is
clustering, which gathers related data into the same cache block.
The second is coloring, which separates data into sets based on
its frequency of usage. Each set is placed into different sections of
memory based on the set-associativity of the cache hardware so that
frequency used data is never evicted for rarely used data.
Cache-conscious reorganization traverses a structure and moves the
scattered elements into contiguous sections of memory. For tree-like
structures this clusters subtrees together while coloring the base elements
of the tree, which will have to be traversed for any access to the tree.
This can be implemented by simply providing a traversal function to a
general algorithm, but any errors in the function or external pointers into
the structure will cause correctness issues. Additionally, the overhead of
examining the entire structure means this technique is more appropriate for
data that changes infrequently.
Cache-conscious allocation replaces the standard heap allocator with a
cache-aware version. This will never cause correctness issues, but it has
a far more limited view of the structure and so can only perform limited
clustering. By providing a pointer to related data, the allocator can
attempt to allocate the new data in the same cache block. When that is not
possible, the authors’ evaluation shows that placing the data in a new
block and reserving the rest of the block for future related allocations
tends to provide the best performance.
After the two systems described above, the paper closes with a model to
predict the speedup expected from these techniques. Memory access times
can be predicted based on memory latencies and cache miss rates. They
derive an expected miss rate based on three factors: the number of memory
accesses required to access an element, the number of elements per cache
block, and the amount of reuse of elements already in the cache. A
worst-case evaluation can be done with one element per block and no reuse.
The cache-aware layout will increase the number of elements in the cache
and produce an expected amount of reuse based on the layout.
In summary, this paper describes two cache-conscious layout techniques for
pointer-based structures. In addition to describing the techniques, it
provides evaluation in a variety of scenarios and an analytical framework
to evaluate the expected performance. Cache-conscious reorganization
allows both clustering related data and separating out frequently used data
so it is not evicted by less relevant data. However, this comes with
higher overheads and possible memory corruption. Cache-conscious
allocation can only perform limited clustering, but is always safe and can
still provide notable improvements.