Harold S. Stone, Fellow, IEEE, John Turek, Member, IEEE, and Joel L. Wolf
While improving locality through cache management, one of the most significant problems is determining how close is our approach to the optimal one. For sequential program, we get OPT replacement algorithm. And for co-runed programs, we should have one. This paper tried. Authors figure out a model on analyzing quality of partition, figuring out optimal partition and dynamically close to it.
This paper contributes in 4 aspects:
- Optimally partition cache for 2 programs
- Find a partition which has similar performance as LRU
- Theoretically show LRU is far from optimal for transient data allocation
- Near optimally partition cache for N programs
Optimally partition cache for 2 programs
Authors start modeling with an insight that miss ratio is linear to cache size in log/log scaling. Thus they got a prediction function for miss ratio that MI(x)=aI*x^(bI*log10), where x is cache size assigned to program I, a and b are coefficient which got from profiling.
Since the total misses is TotalMisses=(MI(x)+MD(C-x))*T/2, where C is total size of cache, and they suppose the two programs I and D has same access rates T. Then within an assumption that the miss ratio is convexity according to cache size, they can got minimal point for TotalMisses when derivation equal to 0.
Combined with there miss ratio function, authors got optimal partition size with followed equation: bI*aI*log10*x^(bI*log10-1)-aD*bD*log10*(C-x)^(bD*log10-1)
Find a partition which has similar performance as LRU
To achieve this goal, authors firstly propose a concept named state of cache for 2 programs. A State x of cache means overall miss ratio without partition is equivalent to a partition, which assign x cache for program I and C-x for program D. And S(x) is possibility function of state x appears.
Since the possibility of stat x transforms to x+1 should be equal to the possibility of x+1 to x, then they have such equation that S(x)MI(x)=S(x+1)MD(C-x-1). According to the monotonic property of MI and MD, S(x) is unimodal.
Finally, for the most possible stat x’, authors showed it is close to optimal partition.
LRU is far from optimal for transient data allocation
With derivation of stats according to time, authors shows LRU needs time to achieve state x’, which is the most possible state as well as the state near to optimal partition.
Since derivation of state x can be represent as dx/dt = Rate of increasing x – Rate of decreasing x, which is essentially equal to MI(x) – MD(C-x). This derivation shows velocity and time for LRU to achieve a near optimal partition state, and also shows when time is short enough, LRU cannot perform close to optimal.
Near optimally partition cache for N programs
With the assumption that miss ratio function is convexity, authors propose a greedy algorithm to achieve a near optimal solution according to the insight of the first section, which is optimal is achieved when the sum of derivation of miss ratio equal to 0. Thus they employ followed algorithm keep the sum as close to 0 as possible:
Let Ci to be cache size of partition i
Initialization: set C1=C2=…=CN=0
Induction step : Find the most benefit programs i, which would reduce the most cache misses with extra 1 cache block
let Ci=Ci+1, and keep other partition does not change, goto step 2 until sum of partition size equal to C