Principles of Memory Hierarchy Optimization (PMHO) 2022

PPoPP Workshop on Principles of Memory Hierarchy Optimization (PMHO)

Video recordings of this year’s talks:

KEYNOTE: Advances in Modeling and Optimization of Caches. Don Towsley.

Random Walks on Huge Graphs at Cache Efficiency. Ke Yang.

CARL: Compiler Assigned Reference Leasing. Chen Ding.

Caching With Delayed Hits. Nirav Atre.

Delayed Hits in Multi-Level Caches. Benjamin Carleton.

A Study on Modeling and Optimization of Memory Systems. Xian-He Sun.

LHD: Improving Cache Hit Rate by Maximizing Hit Density. Nathan Beckmann.

On the Impact of Network Delays on Time-to-live Caching. Karim Elsayed.

Beyond Time Complexity: Data Movement Complexity Analysis for Matrix Multiplication. Wesley Smith.

Saturday 4/2/2022 1pm to 5pm US EDT (GMT -4), 6pm to 10pm Germany, 10:30pm to 2:30am India, 1am to 5am China, 2am to 6am Korea

Zoom Link: https://acm-org.zoom.us/j/92042756839?pwd=TllNMlVaeXQybkpWczh3ZnMxVDdFZz09
  Meeting ID/Passcode: 920 4275 6839 / PPoPP@2022

KEYNOTE

Recent Advances in the Modeling and Optimization of Caches

Donald F. Towsley, Distinguished Professor, University of Massachusetts Amherst

The keynote lecture covers the following results.

  1. An optimization-based approach to caching
  2. A simple, very accurate model of LRU and variants
  3. A new approach to bounding performance of caching policies based on a very simple ordering of failure rates of request distribution along with an application to on-line caching.  

The material is based on the following publications: Deghgan et al. “A Utility Optimization Approach to Network Cache Design,” IEEE/ACM Transactions on Networking, 27(3), 1013-1027, June 2019; “Sharing Cache Resources among Content Providers: A Utility-Based Approach,” IEEE/ACM Transactions on Networking27(2), 477-490, April 2019; Jiang et al. “On the Convergence of the TTL Approximation for an LRU Cache under Independent Stationary Request Processes”, ACM TOMPECS, 3(4), 20:1-20:31, September 2018; Panigrahy et al. “A new upper bound on cache hit probability for non-anticipative caching polices,” IFIP WG 7.3 Performance 2020; and Yan et al. “Learning from Optimal Caching for Content Delivery,” Proc. CoNEXT 2021, December 2021.

Workshop Program

PPoPP Workshop on Principles of Memory Hierarchy Optimization (PMHO)
Saturday 4/2/2022 1pm to 5pm US EDT (GMT -4), 6pm to 10pm Germany, 10:30pm to 2:30am India, 1am to 5am China, 2am to 6am Korea
Welcome and Introduction (1pm)
Keynote (1:10pm to 2:00pm)
Don Towsley: “Recent Advances in the Modeling and Optimization of Caches”.
Session 1 (2:00pm to 2:40pm)
Xian-He Sun: “A Study on Modeling and Optimization of Memory Systems”. JCST Jan. ’21. (2:00pm to 2:20pm)
Karim Elsayed: “On the Impact of Network Delays in Time-To-Live Caching”. Unpublished. (2:20pm to 2:40pm)
Break (2:40pm to 2:50pm)
Chen Ding: “CARL: Compiler Assigned Reference Leasing”. TACO March ’22. (2:50pm to 3:10pm).
Nirav Atre: “Caching with Delayed Hits”. SIGCOMM ’20. (3:10pm to 3:30pm)
Benjamin Carleton: “Evaluating the Impact of Delayed Hits in Multi-Level Caches”. SOSP ’21 SRC Undergraduate Winner. (3:30pm to 3:40pm).
Break (3:40pm to 3:50pm)
Nathan Beckmann: “LHD: Improving Cache Hit Rate by Maximizing Hit Density”. NSDI ’18. (3:50pm to 4:10pm)
Wesley Smith: “Beyond Time Complexity: Data Movement Complexity Analysis for Matrix Multiplication”. Under review. (4:10pm to 4:30pm)
Ke Yang (Video): “Random Walks on Huge Graphs at Cache Efficiency”. SOSP ’21. (4:30pm to 4:45pm)
Open discussion (4:45pm to 5:00pm)

For people who cannot attend, we plan to make the talk videos available as we have done in the 2021 workshop.

Organizers

Faculty:

  • Chen Ding, University of Rochester
  • Nathan Tallent, Pacific Northwest National Laboratory

Student:

  • Wesley Smith, University of Rochester

Call for Presentations

Data movement is now often the most significant constraint on speed, throughput, and power consumption for applications in a wide range of domains. Unfortunately, modern memory systems are almost always hierarchical, parallel, dynamically allocated and shared, making them hard to model and analyze. In addition, the complexity of these systems is rapidly increasing due to the advent of new technologies such as Intel Optane and high-bandwidth memory (HBM).

Conventional solutions to memory hierarchy optimization problems are often compartmentalized and heuristic-driven; in the modern era of complex memory systems these lack the rigor and robustness to be fully effective. Going forward, research on performance in the memory hierarchy must adapt, ideally creating theories of memory that aim at formal, rigorous performance and correctness models, as well as optimizations that are based on mathematics, ensure reproducible results, and have provable guarantees.

Format and Topics

PMHO is an annual specialized and topic-driven workshop to present ongoing, under review, or previously published work related to the following non-exhaustive topic list:

  • Mathematical and theoretical models of memory hierarchies including CPU/GPU/accelerator caches, software caching systems, key-value stores, network caches and storage system caches
  • Algorithmic and formal aspects of cache management including virtual memory
  • Programming models and compiler optimization of hierarchical complex memory systems

There will be no formal peer review process or publication, and presentations will be a mix of selections from submissions and invited presentations.

The 2022 workshop will take place online during the weekend of April 2 (Exact date TBD). It is hosted by the PPoPP 2022 conference.

Submission

Submit your presentation proposal to save-DyfTNd4RHy3y@3.basecamp.com. Submissions should consist of an one-page abstract of the topic you intend to present alongside any (optional) pertinent publications or preprints.

The submission deadline is Monday March 7, 2022 AOE.

Rahman Presenting at Compiler Construction Conference

Rahman was distilling the concept of spatial locality.  Prior to optimizing spatial locality in large scale code, Firefox, Chrome, LLVM, Rahman also developed the newest theory on optimal spatial locality in his paper in POPL 2016.

Rahman Lavaee, John Criswell, Chen Ding:
Codestitcher: Inter-Procedural Basic Block Layout Optimization.  in 28th International Conference on Compiler Construction, February 2019, Washington DC, USA.

2017 2nd URCSSA Alumni Summit

On Oct. 26, Dr. Chengliang Zhang, former graduate and now Staff Software Engineer at Google Seattle,  was invited by Chinese Student and Scholar Association (URCSSA) to speak at the second Alumni Summit titled Cloud | Big Data | AI.  The compiler group held a separate mini-symposium to present our research and had lunch with our esteemed graduate.

Robby Findler seminar and guest lecture

 

Macros matter: effectively building lots of programming languages
Robby Findler
Northwestern University & PLT
Monday, November 14, 2016

Building new programming languages from whole cloth is a difficult proposition at best. Macro system provide an alternative; they support the construction of new programming languages from existing pieces, while still providing the flexibility to radically change the syntax and semantics of the programming language.

In this talk, I will give a high-level overview of the myriad of programming languages that Racket supports, as well as an overview of the research area of macros, showing what can be accomplished with them and introducing some of the associated technical challenges (and their solutions).

Robby Findler is currently an Associate Professor at Northwestern University, and received his PhD from Rice University in 2002. His research area is programming languages and he focuses on programming environments, software contracts, and tools for modeling operational semantics. He maintains DrRacket, the program development environment for the programming language Racket and he co-authored the book _How to Design Programs_, a textbook for teaching introductory programming.

(URCS seminar announcement)

Slides

(CSC 253/453 Guest Lecture)  Redex: A Language for Lightweight Semantics Engineering

Professor Robby Findler, Northwestern University

Redex is a programming language designed to support semantics engineers as they experiment with programming language models.  To explore a model, an engineer writes down grammars, type systems, and operational semantics in a notation inspired by the programming languages literature. Redex breathes life into the model, building typing derivations, running example expressions, and using random generation to falsify claims about the model.

This talk gives an overview of Redex, motivating its design choices and giving a sense of how it feels to program in Redex. Then the talk dives into some of the techniques that Redex uses to generate random expressions.

A video by Prof. Findler on Redex

https://docs.racket-lang.org/redex/

Guang Gao Keynote

Parallel Computation Models and Systems: Dataflow, Coelets, and Beyond

Guang Gao

ACM Fellow and IEEE Fellow
Endowed Distinguished Professor
University of Delaware, and
Founder of ETI

Friday 9/30, 8:30am to 9:30am
LCPC/CnC Workshops @ Hilton Garden Inn Rochester/University

Prof. Gao was the first student since 1970s to leave China to study in MIT computer science 麻省理工计算机专业首位中国大陆留学生.  He was a member of the entering class of 1963 at Tsinghua University, and the first from that university to become both ACM and IEEE Fellows.

Leslie Valiant Keynote

The Multi-core Problem as an Algorithmic Problem
Leslie Valiant
Harvard University

Wednesday 9/28, 1pm to 2pm
LCPC/CnC Workshops @ Hilton Garden Inn Rochester/University

This keynote is free and open to the public but requires registration

Abstract

The writing of efficient parallel programs has always been difficult, and is currently compounded by the increasing complexity of architectures. We suggest that these difficulties need to be addressed already at the algorithm design level. Resources such as books for relevant efficient algorithms are currently lacking.

In sequential computing programmers have enjoyed efficiently universal algorithms, which have permitted them to write programs independent of machines. We suggest that for parallel or multi-core computers this will no longer be generally possible. Algorithms that run efficiently will need to be aware of the resource parameters of the machines on which they run. The main promise is that of portable algorithms, those that contain efficient designs for all reasonable ranges of the basic resource parameters and input sizes.

Such portable algorithms need to be designed just once, but, once designed, they can be compiled to run efficiently on any machine. In this way the intellectual effort that goes into parallel algorithms design becomes reusable. To permit such portable algorithms some standard bridging model is needed – a common understanding between hardware and algorithm designers of what the costs of a computation are. We shall describe the Multi-BSP model as a candidate for this role. We show that for several basic problems, namely matrix multiplication, fast Fourier transform, and sorting, portable algorithms do exist that are optimal in a defined sense, for all combinations of input size and parameter values.

Biography

Leslie Valiant was educated at King’s College, Cambridge; Imperial College, London; and at Warwick University where he received his Ph.D. in computer science in 1974. He is currently T. Jefferson Coolidge Professor of Computer Science and Applied Mathematics in the School of Engineering and Applied Sciences at Harvard University, where he has taught since 1982. Before coming to Harvard he had taught at Carnegie Mellon University, Leeds University, and the University of Edinburgh.

His work has ranged over several areas of theoretical computer science, particularly complexity theory, computational learning, and parallel computation. He also has interests in computational neuroscience, evolution and artificial intelligence.

He received the Nevanlinna Prize at the International Congress of Mathematicians in 1986, the Knuth Award in 1997, the European Association for Theoretical Computer Science EATCS Award in 2008, and the 2010 A. M. Turing Award. He is a Fellow of the Royal Society (London) and a member of the National Academy of Sciences (USA).

(Dec. 13) Xiaoya’s defense “A higher order theory of locality and its application in multicore cache management”

20131213-093111.jpg

University of Rochester
Computer Science Department

Xiaoya Xiang
PhD. Defense

December 13, 2013
CSB 601
9:00 AM

A Higher Order Theory of Locality and Its Application in Multicore Cache Management

As multi-core processors become commonplace and cloud computing is gaining acceptance, applications are increasingly run in parallel over a shared memory hierarchy. While the traditional machine and program metrics such as miss ratio and reuse distance can precisely characterize the memory performance of a single program, they are not composable and therefore cannot model the dynamic interaction between simultaneously running programs.

This dissertation presents an alternative metric called program footprint. Given a program execution, its footprint is the amount of data accessed in a given time period. The footprint is composable — the aggregate footprint of a set of programs is the sum of the footprint of the individual footprints. The dissertation presents the following techniques

• Near real-time foorpint measurement, first by using two novel algorithms, one for footprint distribution and the other footprint average, and then by runtime sampling.

• A higher order theory of cache locality, which shows that traditional metrics can be derived from the footprint and vice versa. (As a result, previous locality metrics can also be obtained in near real time.)

• Composable model of cache sharing, by footprint composition, which is faster and simpler to use than previous reuse-distance based models.

• Cache-conscious task regrouping, which reorganizes a parallel workload to minimize the interference in shared cache.

Through these techniques, the dissertation establishes the thesis that program interaction in shared cache can be efficiently and accurately modeled and dynamically optimized.

Pat Mitchell
University of Rochester
Computer Science Department
RC 270226 CSB Room 734
Rochester, NY 14627
585-275-5671
pmitchell@cs.rochester.edu

_______________________________________________
Seminar mailing list
Seminar@cs.rochester.edu
https://mail.cs.rochester.edu/mailman/listinfo/seminar
———————————————–
To join the mailing list of CS seminar series and to be notified of our
upcoming talks, please send an email to *seminar-join@cs.rochester.edu *