Rocket Compiler Creator and Computer Science Professor Philip Sweany Dies

Screen Shot 2018-04-01 at 5.55.45 PM

Philip Hamilton Sweany, PhD, beloved husband, brother, professor, colleague, and friend, died March 29, 2018 of neuroendocrine cancer.

(https://m.facebook.com/margaret.falersweany/posts/10213796046167865)

Born May 31, 1949 in Seattle, WA, Phil graduated from Washington State University with BS in Zoology in 1972.  He worked in air pollution research for 10 years before returning back to WSU to study and received a BS in Computer Science in 1982.  Phil married Margaret FalerSweany (Peggi) on January 27, 1980.  He received  an MS in 1986 and a PhD in 1992 in Computer Science from Colorado State University at Fort Collins, CO .  Phil was a member of the computer science faculty at Michigan Tech University at Houghton, MI from 1991 to 2000,  Texas Instruments’ Research and Development group in Dallas from 2000 to 2003, computer science and engineering faculty at University of North Texas in Denton from 2003 until his death in 2018.

Phil was my primary advisor during my MS study at Michigan Technological University from August 1994 to May 1996.  I joined his Rocket Compiler research group as a research assistant almost immediately after I arrived at MTU from China.  The computer science department was small and friendly.  I still remember about half dozen faculty members and a dozen graduate students by name and face and recall vivid memories of the time.   I remember Phil being extremely humorous.  One couldn’t stop smiling and laughing while conversing with him.  Also shortly after my arrival, my wife Linlin quit her graduate study at Peking U, and we started our married life.  It was also the first time I heard about Internet, had my first email account (cding@cs.mtu.edu), and my first home page (viewable from anywhere in the world).

Phil’s research was compiling for instruction level parallelism.  He and Steve Carr (who later became my co-advisor) put me to study a technique called software pipelining.  Under their direction, I was exposed to research and developed several improvements.  Through the work, the problem I found hardest and most intriguing was predicting the cost of a memory access.  This problem was the seed that grew into my later work at Rice, which then led to the research, past and present, at Rochester.

Next year I attended my first conference, when Phil took the whole group on the road to Ann Arbor, Michigan to attend 28th MICRO, November 29 to December 1, 1995.   My first paper was published shortly after at 29th Annual Hawaii International Conference on System Sciences (HICSS29), January 3-6, 1996, Maui, Hawaii.  Phil asked the department secretary to book the trip and told me that “unfortunately” to get “reasonable” airfare, I have to stay at Maui for the whole week!  I remember being handed a thick stack of paper tickets (Houghton to Detroit to LA to Honolulu to Maui and back) and when there, stayed in the luxury Intercontinental hotel, with its miles of private white-sand beach.  He put me in charge of renting a car (since I arrived first), although I have never rented a car before.  Phil took Peggi and I to dinner in the first evening.  We toured the rain forest together on the last day.  In between I remember swimming in the ocean, locking the key in the car, taking a helicopter tour, and a number of other things I did for the first time.

My spoken English was so accented that it was indecipherable.  Phil sent me to get help from Scientific and Technical Communication in Peggi’s department.  After being tutored by a student named Lynn for both English pronunciation and writing, people began to understand what I was saying.

At MTU, there were just a handful of CS graduate students, and we had an active social life.   There were multiple parties each year at faculty houses (or beach houses).  Phil and Peggi hosted the Thanksgiving party in their house in Hancock.  Theirs was my first encounter with not just the turkey but its many sides.  Fellow graduate students organized movie nights in the winter and excursions in the fall.  We spent a lot of time chatting when we sat in office, with Phil, Steve and other faculty occasionally walking by.  I remember passing the written test at DMV (made just one mistake) after half hour crash course in the office.  At MTU, graduate students stayed at the university apartments on the side of a hill next to campus.  I was elected a student officer working with a committee selecting movies to run each month on the residential cable network.  There was also much happened together with other Chinese students, e.g. the annual winter extravaganza.  But in about two years when I graduated in 1996, I was thoroughly, properly, and happily Americanized.

My memories at MTU 22 years ago were among the fondest of the time when I was young and a student of Phil.  I wrote the following comment yesterday after hearing the news:

Phil is my role model — a pure hearted scientist and teacher with uncompromising dedication to his research and students.  I’m most fortunate to have him as my MS advisor and will continue to follow his example.  His legacy lives through me and my students.

 

CS 255/455 Spring 2018

CSC 255/455 Software Analysis and Improvement (Spring 2017)

Lecture slides, reading, later assignments, and other material will be distributed through Blackboard.


Assignments:


 Course description

With the increasing diversity and complexity of computers and their applications, the development of efficient, reliable software has become increasingly dependent on automatic support from compilers & other program analysis and translation tools. This course covers principal topics in understanding and transforming programs by the compiler and at run time. Specific techniques include data flow and dependence theories and analyses; type checking and program correctness, security, and verification; memory and cache management; static and dynamic program transformation; and performance analysis and modeling.

Course projects include the design and implementation of program analysis and improvement tools.  Meets jointly with CSC 255, an undergraduate-level course whose requirement includes a subset of topics and a simpler version of the project.

 Instructor and grading

Teaching staff: Chen Ding, Prof., Wegmans Hall Rm 3407, x51373;  Fangzhou Liu, Grad TA;  Zhizhou Zhang, Undergrad TA.

Lectures: Mondays and Wednesdays, 10:25am-11:40am, Hylan 202

Office hours: Ding, Fridays 11am to noon (and Mondays for any 15 minute period between 3:30pm and 5:30pm if pre-arranged).

TA Office hours: Zhizhou, Mondays 2 to 3pm, open area outside the elevator, third floor Wegmans Hall.  Jerry, Tuesdays 3 to 4pm, 3407 Wegmans Hall.

Grading (total 100%)

  • midterm and final exams are 15% and 20% respectively
  • the projects total to 40% (LVN 5%, LLVM trivial 5%, loop+index 10%, dep 10%, par 10%)
  • written assignments are 25% (trivial 1%; 4 assignments 6% each)

 Textbooks and other resources (on reserve at Carlson)

Optimizing Compilers for Modern Architectures (UR access through books24x7), Randy Allen and Ken Kennedy, Morgan Kaufmann Publishers, 2001. Chapters 1, 2, 3, 7, 8, 9, 10, 11. lecture notes from Ken Kennedy. On-line Errata

Engineering a Compiler, (2nd edition preferred, 1st okay), Keith D. Cooper and Linda Torczon, Morgan Kaufmann Publishers. Chapters 1, 8, 9, 10, 12 and 13 (both editions). lecture notes and additional reading from Keith Cooper. On-line Errata

Compilers: Principles, Techniques, and Tools (2nd edition), Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman, Pearson.

Static Single Assignment Book, Rastello et al. (in progress)

Introduction to Lattices and Order,  Davey and Priestley, Cambridge University Press.

2017 2nd URCSSA Alumni Summit

On Oct. 26, Dr. Chengliang Zhang, former graduate and now Staff Software Engineer at Google Seattle,  was invited by Chinese Student and Scholar Association (URCSSA) to speak at the second Alumni Summit titled Cloud | Big Data | AI.  The compiler group held a separate mini-symposium to present our research and had lunch with our esteemed graduate.

RTHMS: A tool for data placement on hybrid memory system

This paper uses a rule based algorithm to guide data placement on hybrid memory system. The hybrid memory system is abstracted as combinations of a FAST memory (HBM) and SLOW memory (DRAM). FAST memory is assumed to have larger bandwidth but larger latency than SLOW memory. Also FAST memory can be either software managed or be configured as the CACHE of SLOW memory. 

The placement decision problem is divided into two steps: (1) Each memory object will be first evaluated individually with a score for each placement choice (FAST, SLOW, CACHE). The rules are listed below(corresponding scores are in brackets):
      R1 (single threaded), memory objects accessed by only one thread are preferred to be placed in SLOW memory. (0, 0, 1). As the high bandwidth will be under utilized if placed in FAST.
      R2 (computing intensity), the number of computing operations on data fetched from memory is larger than a threshold. The memory objects are preferred to be placed in SLOW. (0,0,1). As long latency will be amortized by the cost of computing.
      R3 (small size), memory objects whose cache size is smaller than last level cache (LLC) size are preferred to be placed in SLOW. (0, 0, 1). As LLC can hold all the data and most accesses will result in accessing LLC.
      R4 (small/strided access), memory objects with regular access pattern are preferred to be placed in FAST. (1, 0, -1). As regular accesses are highly optimized to hide memory latency, the bandwidth is the bottleneck.
      R5 (good locality), memory objects with good locality but size larger than FAST memory are preferred to use CACHE model. (N/A, 1, 0)
      R6 (poor locality), memory objects with poor locality but size larger than FAST memory are preferred to be placed in SLOW. (N/A, -1, 1)
      R7 (irregular access, low concurrency), memory objects with irregular memory accesses but low concurrency are preferred to be placed in SLOW. (0, -1, 1). As irregular accesses is hard to optimize to hide latency and low concurrency can not amortize that, placing in lower latency memory is preferred.
      R8 (irregular access, high concurrency), memory objects with irregular memory accesses and high concurrency are preferred to be placed in FAST. (1, -1, 0). As high concurrency can amortize the latency well, exploring the benefit of higher bandwidth is preferred.
       The intuitions  behind can be summarized as follows: placing in FAST is to best utilizing the bandwidth, placing in SLOW is to best utilizing the small latency and place in CACHE is to best utilizing the locality.
       (2) But the size of FAST memory is limited, not every objects that prefer FAST can be all placed in FAST. Global decisions are made by assigning a rank for each object with the following 2 rules to identify which objects should be prioritized for FAST memory assignment.
       R9 (total access), memory objects that accessed often are typically important data structures. Memory objects with larger total accesses have higher priority.
       R10 (write intensity), memory objects that have larger write intensity are more likely to be benefited from higher bandwidth (FAST). Memory objects with larger write intensity have higher priority.

MEMSYS 2017

 

Three Walls by the Monday’s keynote speaker Peter Kogge, University of Notre Dame

 

Memory Equalizer for Lateral Management of Heterogeneous Memory
Chen Ding (University of Rochester), Chencheng Ye (Huazhong University of Science and Technology), Hai Jin (Huazhong University of Science and Technology)

 

Spirited Discussion

Memory Systems Problems and Solutions

• Chen Ding, University of Rochester
• David Donofrio, Berkeley Labs
• Scott Lloyd, LLNL
• Dave Resnick, Sandia
• Uzi Vishkin, University of Maryland


Sally McKee: on Chip Cache


David Wang keynote


Hotel accommodation and conference dinner (and investigation … of murder)