In his MS thesis published in Vancouver Canada four months earlier, Zachary Drudi reported the result of using the footprint technique developed by us to predict the hit ratio curve for a number of Microsoft traces, for cache sizes ranging from 0 to gigabytes (up to 1TB). The footprint prediction is accurate in most cases. Below is the plot copied from the thesis (page number 32, page 40), where avgfp is the footprint prediction, and mattson is the ground truth calculated from the reuse distance (LRU stack distance).
The author implemented our technique entirely on his own without consulting or informing any of us.
The thesis is titled A Streaming Algorithms Approach to Approximating Hit Rate Curves, available online from the University Of British Columbia.
(copyright Zachary Drudi, 2014)
In a series of papers in PPOPP 2008 (poster), PPOPP 2011, PACT 2011, and ASPLOS 2013, Rochester researchers have developed a set of techniques to measure the footprint and use it to predict other locality metrics including the miss/hit ratio curve and reuse distance, for exclusive and shared cache. We have shown that the footprint techniques are efficient and largely accurate. Mr. Drudi’s results provide an independent validation.
This is the first time we know that the technique is used in characterizing storage workloads. The same implementation was used in their OSDI 2014 paper Characterizing storage workloads with counter stacks, J. Wires, S. Ingram, N. J. A. Harvey, A. Warfield, and Z. Drudi.