Trace Analysis and a New Working Set Metric

Wes Smith

Modeling program behavior with trace analysis is an approach as old as modern memory systems, and to no surprise: understanding patterns in data accesses provides insight into memory performance. Denning and Schwartz (1972) introduced average working-set size as the first timescale (window length as a parameter) metric of locality – working set size denotes the number of unique data elements accessed during a sequential portion of a trace.Screen Shot 2018-08-23 at 1.53.12 PM.pngHere, the parameter x denotes window length, N is the length of the trace, and w(t,x) denotes the number of unique elements in the window of length x starting at element t. Note that this formula was intended for infinite length traces – as such, the sums are not enumerable. Denning also provided a recursive formula using a distribution of reuse time frequencies rt:Screen Shot 2018-08-23 at 1.53.49 PM.pngIn 2011, Xiaoya Xiang introduced footprint as an alternative timescale working-set metric, defined as follows:Screen Shot 2018-08-23 at 2.09.55 PM.png

This formulation is complex, and I won’t get into it too much – the last two terms use first and last access time information for each unique element, and the first uses a reuse time histogram. We’ll talk in a bit about some of the pros and cons of footprint compared to average working set size.

Limit case Xiang footprint is, unsurprisingly, an adaptation of footprint to be (theoretically) applied to infinite length traces.Screen Shot 2018-08-23 at 2.08.59 PM.pngIn an upcoming paper, Yuan et al. proved that Denning-Schwartz and limit case Xiang footprint differ by a constant term. Specifically, s(x) = lim_fp(x) – lim_fp(0).

Before we get into experimental results playing with these metrics, we should note that both Xiang and Denning give (near-identical) methods of predicting cache miss ratio using wss/footprint, where c is cache size (in blocks):Screen Shot 2018-08-23 at 2.39.51 PM.png

In words, the miss ratio for a cache of size c is the derivative of wss/footprint at the point where wss/footprint equals the cache size. For such a simple formulation, this relationship models real cache performance well and is widely accepted. Note that this approach requires no information whatsoever about the program itself – only the data resulting from executing it once.

Let’s look at these metrics computed on some real benchmarks. 6 programs from the SPEC 2017 CPU benchmark suite were analyzed with the Loca tool, resulting in data from which we can compute working-set based metrics. CPUwithM.pngGreat! This validates the Denning to footprint relation – it’s clear that these graphs have the same derivative everywhere. Unfortunately, there are some bigger problems with these metrics. Note the horizontal grey line at y = m, where m is the number of unique data items occurring in the entire trace. This should correspond to the end behavior of a working-set metric; it doesn’t make sense for anything based on working-set to compute a value larger than m. In other words, we want f(n) = m. We also want f(0) = 0.

Based on our definition of miss ratio as the derivative of footprint/wss, it follows that a property we want in a timescale metric is concavity: real miss ratio curves are monotone (obviously), and to reflect this our metrics must be concave (negative definite second derivative). Both limit case Xiang footprint and Denning-Schwartz are concave.

Here are the same 6 benchmarks with non-limit case Xiang footprint applied:CPUXiang.pngAgain, the horizontal grey line is at y = m. This time, we see the beginning and end behaviors that we want, but the function is no longer concave – the miss ratio predicted will not be monotone. To summarize what we have so far:chart.001.jpeg

Some straightforward mathematical analysis gives us more information about the end behaviors of Denning-Schwartz and limit case Xiang footprint: lim_fp(n) = 2m, and it follows that s(n) = 2m – lim_fp(0). Finally, some insight! If we’re interested in creating a metric that behaves throughout like lim_fp(x) but like fp(x) at the beginning and end, it’s critical to note that lim_fp(n) – fp(n) = m. Also, (trivially), lim_fp(0) – fp(0) = lim_fp(0).

Here’s what we came up with: reuse-time footprint rtfp(x).

Screen Shot 2018-08-23 at 5.17.22 PM.png

Reuse time footprint, Xiang footprint, and Denning-Schwartz plotted on the benchmarks:CPUrtfp.png

Lots to talk about here. Firstly, reuse time footprint has the same beginning and end behavior as Xiang footprint, just as we’d hoped. Its concavity is clear when compared to footprint, especially on the gcc and perlbench plots. We haven’t proved it, but since we have a proof of concavity for lim_fp(x) it should be one step above trivial. It’s interesting (although not necessarily meaningful) to note that reuse time footprint is not always greater than or equal to Xiang footprint – we see the two intersect in the mcf plot.

The new term in the reuse time footprint is remarkably intuitive. First, we have the xm/n term – since there are m unique data blocks, in the trace, clearly there are m first accesses. It follows that the probability that a given element in a trace is a first access is m/n. Multiplied by window size x, we have the expected number of first accesses in a window. When x reaches n (the window is the entire trace), the term reduces to -m, changing f(n) from 2m to m. An interesting thing to investigate would be how this approach (essentially penalizing programs with low reuse frequency) relates to the first/last access consideration in Xiang footprint.

The second part of the new term is easier understood split up. We can re-express reuse time footprint asScreen Shot 2018-08-23 at 5.16.44 PM.pngBy subtracting lim_fp(0) we create the end behavior we want, and we gradually reintroduce the value of lim_fp(0) by adding (x/n)*lim_fp(0).

  • At x = 0rtfp(x) = lim_fp(0) – lim_fp(0) = 0
  • At x = nrtfp(x) = lim_fp(n) – m – lim_fp(0) + lim_fp(0) = 2m – m = m

In summary, this new metric shows a lot of promise. We haven’t investigated how it compares to pre-existing metrics on some of the more practical applications for trace analytics (e.g miss ratio prediction), but we’ve guaranteed several properties we want. Reuse time footprint has just come about in the last couple of weeks, so there’s lots of work still to be done. However, I have high hopes: either rtfp or a metric that results from more thorough mathematical/experimental analysis could prove to be something very useful indeed.

Thanks to Sicong Fan and Zixu Chen for doing an excellent job with data collection.