In this paper, the authors show how to adapt the reuse distance metric to account for invalidations and cache sharing. Their additions to the model improve its performance by 70% for per-core caches and 90% for shared caches.
Reuse distance analysis does not traditionally consider associativity, block size or replacement policy. Also, multicore systems have additional complications: “Private caches are typically kept coherent using invalidations”. The second problem is the primary target of this paper.
For example, if one thread writes to a datum between two reuses by another thread, there may be an invalidation, and the second reuse will be a miss even if the reuse distance is short.
Alternatively, a thread may experience a hit on its first access to a datum because another thread brought it into their shared caches.
Models
——
Private Caches with invalidation-based coherence:
* Model uses per-thread reuse distance stacks. A write to any address removes that address from all other stacks containing it.
Shared Caches:
* Use a shared reuse stack.
Hierarchical Structures:
* Combine the two models.
Experiments:
They built reuse distance CDFs for 13 benchmark programs using 3 methods: (1) Simulated cache, (2) model-unaware, (3) model-aware. Results were plotted for 12 of these benchmarks, showing that there is significant difference between those methods. Two tables are presented, showing the percent error of (2) and (3) from (1) for private caches, and for shared and pairwise shared caches.
The results show that the prediction accuracy is significantly higher using the invalidation-based and sharing-aware models.