[Stock+:PLDI’14] A Framework for Enhancing Data Reuse via Associative Reordering

Kevin Stock and etc. proposed a framework for data reuse via associative reordering. It is target on higher order stencil computations. High order means the ratio of arithmetic operations to the number of distinct data accessed is high. With this feature, high order stencil computations seem to have more potential of parallelism which should achieve better performance than the bandwidth bounded low order stencil computations. But on the contrary, the performance will decrease with higher order because of increasing register pressure. For example, a N*N stencil computations, there will be N*(N-1) register reuses by each moving step. So the greater the N is, the more likely register spilling will happen.

Focused on this problem, Kevin proposed a compiler framework which they claim the first for enhancing locality by exploit the associativity of operations.

They characterize  stencil computations by a read set and a write set. The computation can be model by many-to-many set of edges from the read set to write set. For the original unoptimized stencil computation: out[i][j] + = in[i+ii][j+jj] * w[ii][jj] (-k<=ii<=k). It can be see as a read set [i-k,i+k][j-k,j+k] mapping to [i][j] with all the edges in a Cartesian space. As the computations in this stencil computation can be organized in any order, so the edges can be moved around.

for (i = 1; i < N – 1; i++) {

S1:   OUT[i]    =  W[0] * IN[i – 1];

S2:   OUT[i] +=  W[1] * IN[i];

S3:   OUT[i] +=  W[2] * IN[i+1];

}

They proposed a abstract representation of the stencil computations. For statement S surrounded by P loops, the iteration set can be represented by:

IS  = { i, j, k, … |a1<=i<=b1, a1<=j<=b1, a1<=k<=b1}

Such as: IS1  = { i |1 <= i < N-1}

For statement S: A[i][j-2] = …, data accessed can be represented by:

fA: (i, j – 2)

Such as: IN[i – 1] in S1,  fINS1  : (i-1)

For S surrounded by P loops, Program execution order can be represented with a vector with length 2 * P + 1. The odd elements in the vector are scalars indicate the interleaving of loops and statements, the even elements are the affine expression of the surrounding loops.

Ts = {a, i, b, j, c, k, d, …}   (i,j,k are affine expression, a, b, c, d are scalars)

Such as:

TS1 : (0,i,0)  TS2 : (0,i,1)   TS3 : (0,i,2)

With this representation, loop transformation can be seen as mapping the original execution orders T to a new T’.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s