Dan-Towsley model of cache sharing

At Rochester we have studied the Dan-Towsley model multiple times.  The description in their paper takes some effort to understand.  Here we put down additional explanation for anyone who is interested in this model.  The formulas included here are screen copies from the original paper.

In this problem, we have a collection of D fixed size items that share an LRU cache that can store B items. The D items are divided into k partitions.  Each partition has D_k items and is accessed with the probability alpha_k.  The access is assumed to be uniformly random within each partition.  This corresponds to the Independence Reference Model (IRM) of King in 1971.

The LRU cache can be viewed as a stack with most recently accessed item at the top and least recently accessed item at the bottom.  The cache state is defined by the content of these k positions.  In this formulation, the state of each position is the partition id of the data item it stores.

We emphasize the use of partition id,  For example for B=3, k=2, and D_1=2, a valid cache state may be (1, 1, 2), with both items of Partition 1 in the cache.  This state records only the partition id, not the specific data item.

The following formula gives the set S of all possible cache states:

 

Screen Shot 2017-06-15 at 5.19.36 PM

A valid state is a sequence of B positions with two constraints shown by the formula.  First, for each partition k, the number of items in the cache cannot be more than its total number of times.  Second, the total number from all partitions  equals to B.   Using the example B=3, k=2, and D_1=2 again, (1,1,1) would not be a valid state since it violates the first constraint of S.

King formulated the problem as a Markov Chain.  A Markov Chain has a set of states and transition probabilities between the states.  A common example is a drunkard’s walk.  Let there be a set of bars.   When leaving each bar, the drunkard has some probability to go to another bar.  As a Markov Chain, each bar is a state.  We are interested in computing the likelihood for each state.   If we choose a state to compute the likelihood, we call this the target state.  We use all the states that may transit into the target state.   We call these preceding states.  An equation can be constructed to show that the likelihood of the target state from the likelihood of preceding states and the transit probability from them to the target state.   For the drunkard, we compute the likelihood that he visits a particular bar, e.g. Starry Night.  The Starry Night is the target state.  Its likelihood depends on a nearby bar, e.g. Joe Bean, so we can use the likelihood of Joe Bean times the probability that the drunkard would go from Joe Bean to Starry Night.

In the following, the target state x=(x_1, x_2, …, x_B).  Its likelihood is computed from all possible preceding states.  The first line of the equation shows the transitions due to cache hits, and the next two lines (a product of each other) the transitions due to cache misses.

Screen Shot 2017-06-15 at 5.19.09 PM

It is understandably non-trivial to solve a Markov Chain problem with this many states and transitions.  King gave an exact solution which has a high computational complexity, exponential to D and B.

The Dan-Towsley model is an efficient approximation.  It consists of almost entirely the following two equations.  Eq. 2 is the key.  In Eq. 2, p_k(j) is the probability of a Partition-k item is stored at position j, and r_k(j-1) the probability that if the item at position j-1 moves to j, the probability that this item is from Partition k.  The Dan-Towsley model says that the two probabilities are equal.

The equation is recursive, since the two probabilities are used to compute each other.  b_k(j) is the occupancy of Partition-k at stack positions up to j.  It is computed from p_k(j).  This occupancy is used to compute the likelihood that the miss happens for a Partition-k item.  Eq. 2 computes r_k(j-1) by a ratio.  The denominator is the likelihood of an access at stack position below j-1.  This is a miss for B=j-1.  The ratio is the likelihood that the miss happens for a Partition-k data item.

Screen Shot 2017-06-15 at 5.20.03 PM

Eq. 2 is easily solvable by iterating starting from j=1 and p_k(1)=alpha_k.

More explanation is needed for Eq. 2.  In the text, the paper says that r_k(j-1) is the probability that if the item at position j-1 moves to j, the probability that this item is from Partition k.  In the equation, the ratio is likelihood that the miss happens for a Partition-k data item.  The two are related in a subtle way — both are required for the occupancy stays the same before and after the miss.

Xiaoming Gu was the first to study and implement the Dan-Towsley model at Rochester.  In 2008, he derived the distribution of reuse distance of random access, which corresponds to the solution of IRM for k=1.  He then found the Dan-Towsley model and verified it as an efficient and accurate solution for any k.

The Dan-Towsley model is a brilliant solution based on the cache occupancy.   Because of the IRM assumptions, the miss ratio can be computed from cache occupancy.   The general solution needs to consider locality.  The general problem is solved in recent years including the footprint based model developed at Rochester.   It is extremely interesting to compare and contrast the occupancy-based model of cache sharing and locality-based models.

Asit Dan, Donald F. Towsley:  An Approximate Analysis of the LRU and FIFO Buffer Replacement Schemes. SIGMETRICS 1990: 143-152

King, W. F., “Analysis of Paging Algo- rithms,” In Proc. IFIP Congress, pages 485- 490, Ljublanjana, Yugoslavia, aug 1971.

 

Gu, Xiaoming, “Reuse Distance Distribution in Random Access“, TR930, Computer Science Dept., U. Rochester, January 2008.

Acknowledgement.  The explanation here is partly the result of discussion with Chencheng Ye and Rahman Lavaee.  Chencheng’s research is supported by an IBM CAS fellowship, and Rahman by NSF CCF-1629376.

CS 255/455 Spring 2017

CSC 255/455 Software Analysis and Improvement (Spring 2017)

Lecture slides (when used), demonstration programs, and some of the reading material will be distributed through Blackboard.  Assignments and projects will be listed here.


Assignments:

  • Trivia assignment.  Search slashdot.org for the posts on GCC, LLVM, RUST, Scala or Haskell.  Select two posts to read the posts and all discussions.  Write a summary with 200 or more words for each of the two posts.  The summary should include at least a precise fact on the topic as well as an opinion with all supporting and disagreeing arguments pulled from the discussions.  Print and submit a paper copy Monday January 23rd at the start of the class.  Then see me in one of my office hours for feedback on the summary.   Meet me on or before February 3rd.  The grade is assigned after the meeting.  Bring a copy of your paper to the meeting (in addition to the one you submit).

 Course description

With the increasing diversity and complexity of computers and their applications, the development of efficient, reliable software has become increasingly dependent on automatic support from compilers & other program analysis and translation tools. This course covers principal topics in understanding and transforming programs by the compiler and at run time. Specific techniques include data flow and dependence theories and analyses; type checking and program correctness, security, and verification; memory and cache management; static and dynamic program transformation; and performance analysis and modeling.

Course projects include the design and implementation of program analysis and improvement tools.  Meets jointly with CSC 255, an undergraduate-level course whose requirement includes a subset of topics and a simpler version of the project.

 

 Instructor and grading

Teaching staff: Chen Ding, Prof., CSB Rm 720, x51373;  Dong Chen, Grad TA;  Jacob Bisnett, Undergrad TA.

Lectures: Mondays and Wednesdays, 10:25am-11:40am, CSB 601

Office hours: Ding, Fridays 11am to noon (and Mondays for any 15 minute period between 3:30pm and 5:30pm if pre-arranged).

TA Office hours: Dong Chen, Tuesdays 3:30pm to 4:30, CSB 720. Jacob Bisnett, Thursday 1:00 pm to 1:50 pm, CSB 720.

Grading (total 100%)

  • midterm and final exams are 15% and 20% respectively
  • the projects total to 40% (LVN 5%, GCC/LLVM/RUST 5%, local opt 10%, global opt 10%, final phase 10%)
  • written assignments are 25% (trivial 1%; 3 assignments 8% each)

 Textbooks and other resources

Optimizing Compilers for Modern Architectures (UR access through books24x7), Randy Allen and Ken Kennedy, Morgan Kaufmann Publishers, 2001. Chapters 1, 2, 3, 7, 8, 9, 10, 11. lecture notes from Ken Kennedy. On-line Errata

Engineering a Compiler, (2nd edition preferred, 1st okay), Keith D. Cooper and Linda Torczon, Morgan Kaufmann Publishers. Chapters 1, 8, 9, 10, 12 and 13 (both editions). lecture notes and additional reading from Keith Cooper. On-line Errata

Compilers: Principles, Techniques, and Tools (2nd edition), Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman, Pearson.

Static Single Assignment Book, Rastello et al. (in progress)

Let All Trees Grow

Tree type definitions in Haskell as answers to a homework question of CSC 253/453.  They make a good demonstration how a type system can be used by a programmer to communicate its program design to the compiler, so the compiler can check correctness of the implementation automatically.

It is also a good exercise to try writing the HasMap instance for these definitions.  Can you find which one of them has a “growth” problem and cannot be used in practice?

Robby Findler seminar and guest lecture

 

Macros matter: effectively building lots of programming languages
Robby Findler
Northwestern University & PLT
Monday, November 14, 2016

Building new programming languages from whole cloth is a difficult proposition at best. Macro system provide an alternative; they support the construction of new programming languages from existing pieces, while still providing the flexibility to radically change the syntax and semantics of the programming language.

In this talk, I will give a high-level overview of the myriad of programming languages that Racket supports, as well as an overview of the research area of macros, showing what can be accomplished with them and introducing some of the associated technical challenges (and their solutions).

Robby Findler is currently an Associate Professor at Northwestern University, and received his PhD from Rice University in 2002. His research area is programming languages and he focuses on programming environments, software contracts, and tools for modeling operational semantics. He maintains DrRacket, the program development environment for the programming language Racket and he co-authored the book _How to Design Programs_, a textbook for teaching introductory programming.

(URCS seminar announcement)

Slides

(CSC 253/453 Guest Lecture)  Redex: A Language for Lightweight Semantics Engineering

Professor Robby Findler, Northwestern University

Redex is a programming language designed to support semantics engineers as they experiment with programming language models.  To explore a model, an engineer writes down grammars, type systems, and operational semantics in a notation inspired by the programming languages literature. Redex breathes life into the model, building typing derivations, running example expressions, and using random generation to falsify claims about the model.

This talk gives an overview of Redex, motivating its design choices and giving a sense of how it feels to program in Redex. Then the talk dives into some of the techniques that Redex uses to generate random expressions.

A video by Prof. Findler on Redex

https://docs.racket-lang.org/redex/

Yun Liang Seminar


Yun (Eric) Liang
Peking University
Enabling Effective Optimization Techniques for Heterogeneous System
Abstract: Heterogeneous systems couple CPUs with Programmable Gate Array.   In this talk, I will first present the on-chip storage and multitasking optimization techniques for GPUs. The proposed techniques leverage on compile-time and run-time techniques to improve the cache performance, register utilization, pipeline utilization and overall performance. For the second half of the talk, I will present performance modeling and optimization techniques for FPGAs based on OpenCL programming model.

Bio: Yun (Eric) Liang is currently an assistant professor in School of EECS at Peking University, China. Before joining Peking University, he was a Research Scientist in University of Illinois at Champaign Urbana. He received the B.S degree from Tongji University, Shanghai, and the Ph.D degree in computer science from National University Singapore. He has published more than 40 research papers in the top conferences and journals on compilation, computer architecture, and embedded system including MICRO, HPCA, ISCA, DAC, CGO, ICCAD, FPGA, FCCM, etc. His work has received the Best Paper Award of FCCM 2011 and Best Paper Award nominations from ASPDAC 2016, DAC 2012, FPT 2011, and CODES+ISSS 2008.