This paper studies “lifetime functions”, a measure of the time between faults, for LRU and WS page replacement policies, using contrived page access traces based on the working set model. The paper demonstrates that this model is able to reproduce some “known properties of empirical lifetime functions”.

The lifetime is defined as 1/f, where f is the fault rate. It can be thought of as the average virtual time between misses in a program, and expressed as a function of the average space allocation (a.k.a. resident set size) x. For LRU, x = r, the constant amount of space allocated to the program. For WS, the working set replacement algorithm, x the average of r(k) over all references k:

x = 1/k * \sum_{k=1}^K r(k).

Four properties of lifetime functions are defined:

(1) Lifetime functions usually have an S-curve shape.

(2) The WS lifetime is typically higher than the LRU lifetime.

Ideal Estimator:

(a) Resident set is a subset of the current locality set.

(b) At a transition, the resident set contains only pages in both the incoming and outgoing locality sets.

(c) “Page faults occur only for first references to entering pages.””

H: Mean phase duration (“holding time”).

M: Mean number of pages entering, at a transition.

(3) At the knee of the WS lifetime curve, the lifetime is approximately H/M. Intuitively, this is because at the ideal space allocation, H/M is the ratio of time:misses.

(4) There are some (defined) bounds on the difference between the placement of the knees of the lifetime curves for a fixed-space policy and the mean locality size, for Gaussian distributed locality set sizes.

Denning and Kahn go on to describe their program model. There is a macromodel and a micromodel. The macromodel describes how locality sets (working sets) come and go, and the micromodel describes what is done within the locality sets. For the macromodel, they use a semi-Markov model where the working set is the state of the system. They define the parameters of the semi-Markov model: holding time distribution; working set size distribution; and phase transition probabilities, which are only dependent on the phase transitioned to (each column of the transition matrix is consists of all-same numbers).

In the model they used mutually disjoint locality sets.

The Micromodels chosen are:

(1) Cyclic: e.g. abcabcabc…

(2) Sawtooth: e.g. abccbaabccba…

(3) Random: e.g. cabbabcccab…

Results:

Denning and Kahn show results that demonstrate that their program model generates lifetime curves that mimic empirical ones, specifically regarding the four properties of lifetime functions outlined above.

The below photo shows an interpretation of the working set model for memory access behavior. Each point is a data access belonging to either the “x”, “o” or “star” working set. Phases of the program are regions where a single working set dominates (shown with brackets). The ideal choice for working set window size is large enough to encapsulate entire working sets, but small enough to fit within phases. In this paper, the authors describe the model where working sets overlap, but use a more basic model where working sets do not overlap.

Fig. 2 shows the lifetime curve (inter-miss time) in both WS (variable allocation) and LRU (fixed allocation) and in this example, the WS curve climbs faster and plateaus higher than the LRU curve, showing the effect of variable-size allocation. Does the paper give a theoretical characterization on this? Is this an example effect or a property of all or a broad class of cases?

More Fig. 1 explanation. H/M is best space-performance tradeoff since H is the phase length and M is the number of new pages needed by the phase. The ideal estimator of WS should give average allocation equal to x1. A fixed-space policy needs x2 to give the same performance. Property 4 says that the difference is proportional to the variance sigma, assuming the locality set size follows a Gaussian distribution.