Andreas Sembrant and etc. proposed a phase-aware model to calculate an application’s miss ratio more efficiently. They combine online phase detection and phase guided profiling to reduce overhead and increase the  accuracy.

The StatCache which proposed by Berg and Hagersten is a low overhead statistical model. It can estimate miss ratio for caches of arbitrary size. But this model only report the average miss ratio which may not fully expose the application’s features. The miss ratio R can be calculated by the following function:

R * N = h1 f(1*R) + h2 f(2*R) + h3 f(3*R) + …

N = h1  + h2 + h3 + …

f(n) = 1 – (1-1/L)n

N is the number of reuse distance samples, f(n) is the function that indicate the probability that a cache line has been evicted from the cache when it was in the cache n cache misses ago. L is the number of cache lines in the cache.

One simple solution is that divide the program into window and sampling for each window to handle program phases. But the overhead will be significantly increase to capture the short application phases. To avoid this overhead, phase detecting algorithms are used to divide the program into different phases, sampling each phase once is enough and the subsequent instances of the same phase can use the same profiled data.

Based on this thinking, They proposed the ScarPhase (Sample-based Classification and analysis for Runtime Phases). They group the reuse distance of the program by phases and than apply the StatCache model to calculate the miss ratio for each group. They use ScarPhase library to detect and classify phases base on the application’s execution history.

Also they found that variations can also happen inside phases and classified them into four types: No variations, Transition variations, Periodic variations and Instance variations.

(1)No variations mean little changes happened inside the phase, so only a small part of each phase needs to be profiled.

(2)Transition variations mean the behavior slowly changes form one phase to another. So the transition should be divided into windows from the start to the end. More windows to profile, more accuracy we gain.

(3)Periodic variations mean the behavior changes rapidly and periodically.

(4)Instance variations mean the same phase operating on different data may have different miss ratio.

Compared with StatCache model, the ScarPhase increases accuracy by 39% and efficiency by 600%.

## 6 thoughts on “[Sembrant+:CGO2012] Phase Guided Profiling for Fast Cache Modeling”

1. dcompiler

The description includes the equation of StatCache, which is a seminal piece of work in cache performance modeling. The equation is implicit and models the performance of random-replacement cache. A later study on this type of cache by another group is West et al. in Operating Systems Review 44(4), 2010.

The assumption of the original StatCache addressed in this paper is that the miss ratio is the same in all reuse windows.

The techniques use parameters, e.g. no variation means the difference is smaller than a threshold. Parameters in general are difficult to have a full scientific understanding, e.g. what’s the effect of other parameters, and how to choose right parameters for new benchmarks.

2. jacobwbrock

The parameters for f(1*R) + f(2*R)… are not likely to be integers. This means that the probability of eviction of a datum is being calculated based on a non-integer number of evictions. What does this mean exactly? If f(n) were a linear function, f(0.5) + f(0.5) would be f(1), but f(n) is non-linear. Does the technique work because the function is locally-linear-enough?

• dcompiler

Good point. 0.5 misses could be interpreted as having a miss 50% of the times. However, the interpretation implies linearity, i.e. f(0.5) = 0.5 (f(0) + f(1)), which is not true with f(n).

3. rahman lavaee

The paper claims that random replacement is close to LRU and they experimentally show that. But I think there exist workloads that manifest much bigger gap between the LRU and RAND miss ratio.

• dcompiler

Yes, and one can see West et al. in Operating Systems Review 44(4), 2010 for more experimental evaluation.

4. dcompiler

There is an evaluation problem I explained in Chen Ding, Xiaoya Xiang, Bin Bao, Hao Luo, Ying-Wei Luo, Xiao-lin Wang: Performance Metrics and Models for Shared Cache. J. Comput. Sci. Technol. 29(4): 692-712 (2014).

Random has two other differences from LRU. One is well known, which is that the random replacement cache is fully associative by definition. The other is less recognized, which is that the cache performance is not deterministic as the replacement decisions change randomly every time a program is run. Fortunately, the problem is recently solved. Zhou gave an ingenious algorithm to compute the average miss ratio in a single pass, without having to simulate multiple times to compute the average, in the following paper:

ZHOU Shuchang. An efficient simulation algorithm for cache of random replacement policy. In Proc. the IFIP Int. Conf. Network and Parallel Computing (NPC), Sept. 2010, pp.144-154.