[Crummey+:ICS99]Improving memory hierarchy performance for irregular applications

John Mellor-Crummey from Rice University investigated data and computation reordering to improve memory performance for applications with irregular memory accesses.

For regular memory accesses, the gap between CPU and memory can be well bridged by loop blocking and prefetching. but for irregular memory accesses, the accesses can only be known during the run time. One approach to improve memory performance is to dynamically reorder data before time consuming computations. And also, computation reordering along with data reordering will be much more effective. The reason why the data and computation reordering can be effective is that they increase the probability that data in the same block will be access closely in time and the probability that data will be reused before the block is elected.

For example, N-body is a classical irregular application which is used to calculate the interaction between particles. It contains two list, one stores the information of each particles and the other stores the paired indices of particles which will interact with each other.

f1.001

Data reordering can increase spacial locality by placing the data near one another by the accessed order. Two data reordering approaches are proposed:

First Touch Data Reordering(FTDR): before computation, a linear scan will be performed on the interaction list to get the new order for particle list.

f2.001

Space Filling Curve Data Reordering(SFCDR):  before computation, reordering the particle list by space filling curve which make the particles with shorter distance in space close with each other. And one advantage compared with FTDR, the space filling cure reordering can be performed without knowing the order of the computation.

f3.001

Computational reordering can improve spacial locality as data reordering and also improve temporal locality. Two computational reordering are proposed:

Space Filling Curve Computation Reordering(SFCCR):  reorder the computations according to space filling curve, and the particle list maintains the same.

f4.001

Computational Reordering by Blocking(CRB): Before reordering the computations, data should be given a block index (the index can be more than one dimension), then reorder the computations according to the block number which the particles belongs to the same block will be processed together.

f5.001

One thought on “[Crummey+:ICS99]Improving memory hierarchy performance for irregular applications

  1. John’s followup work on this was discussed in https://wordpress.com/read/post/id/59712121/213/, which studies efficient algorithms to compute different space-filling curve orders.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s