May 11, 2016 Chen Ding

Six-month Reviews

Hao: composable locality in parallel code – theory and optimization

Rahman: optimizing code layout for very large software

Pengcheng: a higher order theory of memory demand

Jake: hybrid memory system management

Mike the award-winning thesis committee member

April 10, 2016 Chen Ding

Rochester DandyHacks

3rd spring Hackathon, 120 students, 36 hours, 17 projects, CS and data science.

Award winners: http://dandyhacks2016.devpost.com/
Many other good projects and presentations (ones I wrote down): Control-f books. Droplet. Electric go cart. Improvs. Vote2eat. Trump tweets. Medbook.tech. Musically.me. Naptimes. Serial port game. Pass the paragraph. Refugestudent.com. Trump bot. Rhythm fighter. Alg is simple. 3D gesture drawing. Virtual Tetris.

Okay. The last photo is a joke.

April 5, 2016 rahman lavaee

Assignment 6 (Dead Code Elimination)

This assignment is due Monday April 18th at 11:59pm.

For this assignment you are expected to identify and eliminate dead code. Dead code is any instruction that does not affect the output of the program (a function, in particular). For DCE (dead code elimination), you will need to use def-use chains to reason about instructions that do not affect the outputs. Remember that the output of a function is not limited to what it is returning.

IMPORTANT NOTE:

Those who implemented assignment 4 in LLVM, and have relied on the SSA property to find def-use chains with a linear scan of the LLVM-IR, are expected to implement a further requirement for this assignment, whether they are choosing URCC or LLVM for this assignment. That is Common Subexpression Elimination (CSE). An example of this optimization has been illustrated below.

Adopted from [1]

For CSE, you must first do the AVAIL analysis and then reason about expressions that are redundant.

Note 1: Since LLVM does not permit copy instructions, you may need to use the “ReplaceAllUsesWith” method to replace all uses of c with t. Also, since LLVM uses the SSA representation, you might need to insert new phi nodes.

Note 2: You do not need to implement partial redundancy elimination (PRE) for this assignment.

Testing

Your are expected to test your optimization pass by counting the number executed instructions. To do so, you are provided with a test script. Download this script into your URCC repository.

If you are using URCC for this assignment, you can easily run “./test.rb all” to run all the tests. You can also run an individual test by running “./test.rb #{program_name}”.

If you are using LLVM for this assignment, you will need to reimplement Test.test method in the script above by following the current implementation of that method. This is where you need to compile a program and apply your optimization, and instrumentation. You are allowed to use your peers’ test scripts.

The “./test.rb all” command will give you a “urcc_test_results.txt” file that includes the program outputs (along with your dynamic instruction count. Remember to print the instruction count as a single line “INST COUNT: x”, where x is the number of total executed instructions). Include this file in your report, along with your test script (if modified) and instructions on how to run the tests. Do not forget to give a thorough explanation about your implementation.

Note: URCC appears to be failing on generating a correct executable for two of the test cases: sort.c and tax.c. You can exclude these test cases by removing them from the test directory.

March 24, 2016 rahman lavaee

Assignment 5: Def-Use Chains

This assignment is due on Wednesday March 30th at 11:59pm.

In this assignment, you are expected to build the def-use chains for a program. You can either use your own CFG pass (if you are sure about its correctness), or adopt any correct CFG pass that is available to you.

Your output must have the following format. For every definition of a variable a, and for every use of a which is reachable from that definition (without any intervening def), you must output

[a, def, use],

where def and use are the program statements corresponding to the definition and use of a.

Run your analysis on the test cases provided in the URCC test directory and report the results. Explain your implementation and findings in a readme file. Archive everything and submit on blackboard.

March 15, 2016 rahman lavaee

Assignment 4: LVN in URCC/LLVM

For this assignment you are expected to implement the local value numbering optimization in URCC or LLVM. The requirements are the same as in assignment 1, except that the redundancy elimination part is now required. For this assignment, you don’t need to implement Stewart extension.

Test your optimization pass on URCC test cases. Report your implementation details and your findings in a readme file and submit on blackboard.

If you are working on LLVM, remember to turn off optimizations by using the “-O0” flag when emitting LLVM bitcode (The base project directory in /home/hoover/u1/cs255/cs255-llvm has been modified accordingly).

This assignment is due Sunday March 20th at 11:59pm.

February 29, 2016 rahman lavaee

Assignment 3 (CFG in URCC/LLVM)

In this assignment you will implement a pass to build a program’s CFG from its linear code. This assignment is due on Sunday March 6th at 11:59pm.

For this assignment, you can choose to use either the URCC compiler or LLVM.

Instructions for URCC

Checkout and install the URCC compiler (follow the instructions in the README file). Make sure you are using a version of clang between 3.3 and 3.6 or URCC may fail. If you decide to use the CSUG machines, you can use clang 3.6 that is installed in /u/cs255/build-llvm-36/bin.
Follow the algorithm in section 5.3.4 of the Cooper & Torczon book (page 241) to write a pass that builds the CFG of the program and visualizes it using graphViz (instructions on how to write a URCC pass are included in the README file).
Test your compiler pass on test cases included in the test directory.
Explain your implementation in a Readme file. Report all test case failures.
Archive your code and the Readme file and submit on Blackboard.

Note: A CFG pass is already implemented in the URCC compiler. You shall not look at that implementation.

Instructions for LLVM

Following the same instructions as in assignment 2, create an LLVM project in your CSUG account.

Implement a pass that builds the CFG of the program and visualizes it using graphviz. If you need to iterate over instructions in a function, you must use the following code:

#include "llvm/IR/InstIterator.h"

// F is a pointer to a Function instance
for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
  errs() << *I << "\n";

You can use BGL (Boost Graph Library), which is already installed on CSUG filesystem. Graphviz is included in this library.
Test your compiler pass on test cases included in URCC’s repository (https://github.com/dcompiler/urcc/tree/master/tests).
Explain your implementation in a Readme file. Report all test case failures.
Archive your code and the Readme file and submit on Blackboard.

Note: A CFG pass is already implemented in LLVM. You shall not look at that implementation.

February 10, 2016 rahman lavaee

Assignment 2 (GCC/LLVM)

In this assignment, you will implement and test a compiler pass that instruments a program to report the number of intermediate-level executed instructions. You can choose to implement this pass either in gcc or llvm. For your convenience, these two compilers are already installed on the csug network.

The idea is to insert appropriate calls in the program (instrumentation).

*********************************Instructions on GCC****************************************
——————————————————————————————–
Log in to your csug account.
$ cp -r /u/cs255/cs255-gcc YOURUSERNAME-cs255-gcc
$ cd YOURUSERNAME-cs255-gcc

Included files:
———————–
* inst-cnt-plugin.cc
You must implement your compiler pass here.
This file already provides the skeleton and a related example.

* inst-cnt-rt.c
This file implements the runtime functions that you need for the instrumentation.

* test.c
This is a simple program to test your pass.

After implementing the pass, compile it by running “make” and test it using “make check”. This gives you the instrumented program “test”. Run it on the string “cs255” and report your output. Make sure to explain your implementation and findings in a readme file.

Submission guideline:
Archive your working directory using the following command line, and submit on Blackboard.
tar -czvf YOURUSERNAME-cs255-gcc.tar.gz YOURUSERNAME-cs255-gcc/

Note: you can use the gcc option -fdump-tree-all to inspect your instrumentation at the GIMPLE level. After running “make check” with this option, look for the file which has your pass name as a suffix in its filename.

*********************************Instructions on LLVM**************************************
——————————————————————————————–
Log in to your csug account.
$ cp -r /u/cs255/cs255-llvm YOURUSERNAME-cs255-llvm
$ cd YOURUSERNAME-cs255-llvm

Included files:
———————–
* lib/InstCounter.cpp
You must implement your compiler pass here.
This file already provides the skeleton and a related example.

* runtime/InstCounting.c
This file implements the runtime functions that you need for the instrumentation.

* test/test.c
This is a simple program to test your pass.

After implementing your pass, compile it by running “make” in your top-level directory. Then cd into the “test” directory and run “make check” to test your pass. This gives you the instrumented program “test”. Run it on the string “cs255” and report your output. Make sure to explain your implementation and findings in a readme file.

Submission guideline:
Archive your working directory using the following command line, and submit on Blackboard.
tar –exclude=’.svn’ –exclude=’autoconf’ -czvf YOURUSERNAME-cs255-llvm.tar.gz YOURUSERNAME-cs255-llvm/

Note: you can use the llvm-dis tool (/u/cs255/build-llvm-38/bin/llvm-dis) to check your instrumentation at IR level. Run this tool on the llvm bitcode file that is generated by your pass:
/u/cs255/build-llvm-38/bin/llvm-dis <test.bc.opt> test.bc.opt.ll
———————————————————————————————

January 28, 2016 jacobwbrock

HVMIN: A Variable-Space Page Replacement Algorithm for Heterogeneous Memory

Existing Policies

In their 1976 paper “VMIN — An Optimal Variable-Space Page Replacement Algorithm”, Prieve and Fabry outline a policy to minimize the fault rate of a program at any given amount of average memory usage (averaged over memory accesses, not real time). VMIN defines the cost of running a program as C = nFR + nMU, where n is the number of memory accesses, F is the page fault rate (faults per access), R is the cost per fault, M is the average number of pages in memory at a memory access, and U is the cost of keeping a page in memory for the time between two memory accesses. VMIN minimizes the total cost of running the program by minimizing the contribution of each item: Between accesses to the same page, t memory accesses apart, VMIN keeps the page in memory if the cost to do so (Ut) is less than the cost of missing that page on the second access (R). In other words, at each time, the cache contains every previously used item whose next use is less than τ = R accesses in the future. Since there is only one memory access per time step, it follows U that the number of items in cache will never exceed τ. Since this policy minimizes C, it holds that for whatever value of M results, F is minimized. M can be tuned by changing the value of τ.

Denning’s working set policy (WS) is similar. The working set W (t, τ ), at memory access t, is the set of pages touched in the last τ memory accesses. At each access, the cache contains every item whose last use is less than τ accesses in the past. As in the VMIN policy, the cache size can never exceed τ. WS uses past information in the same way that VMIN uses future information. As with VMIN, the average memory usage varies when τ does.

Adaptations to Heterogeneous Memory Architectures

Compared with DRAM, phase change memory (PCM) can provides higher capacity, persistence, and significantly lower read and storage energy (all at the cost of higher read latency and lower write endurance). In order to take advantage of both technologies, several heterogeneous memory architectures, which incorporate both DRAM and PCM, have been proposed. One such proposal places the memories side-by-side in the memory hierarchy, and assigns each page to one of the memories. I propose that the VMIN algorithm described above can be modified to minimize total access latency for a given pair of average (DRAM, PCM) memory usage.

Using the following variables:

n: Length of program’s memory access trace
F: Miss/fault ratio (fraction of accesses that are misses to both DRAM and PCM)
H_DRAM: Fraction of accesses that are hits to DRAM
H_PCM: Fraction of accesses that are hits to PCM
R_F: Cost of a miss/fault (miss latency)
R_H,DRAM: Cost of a hit to DRAM
R_H,PCM: Cost of a hit to PCM
M_DRAM: Average amount of space used in DRAM
M_PCM: Average amount of space used in PCM
U_DRAM: Cost (“rent”) to keep one item in DRAM for one unit of time
U_PCM: Cost (“rent”) to keep one item in PCM for one unit of time
rt_fwd(b): The forward reuse time of the currently accessed item, b

Define the cost of running a program as C = nFR_F + nH_DRAMR_H,DRAM + nH_PCMR_H,PCM + nM_DRAMU_DRAM + nM_PCMU_PCM, where we are now counting the cost of a hit to each DRAM and PCM, since the hit latencies differ. If at each access we have the choice to store the datum in DRAM, PCM, or neither, until the next access to the same datum, the cost of each access b is rt_fwd(b) * U_DRAM + R_H,DRAM if kept in DRAM until its next access, rt_fwd(b) * U_PCM + R_H,PCM if kept in PCM until its next access, and R_F if it is not kept in memory. At each access, the memory controller should make the decision with the minimal cost.

Of course, by minimizing the cost associated with every access, we minimize the total cost of running the program. The hit and miss costs are determined by the architecture, while the hit and miss ratios, and the DRAM and PCM memory usages are determined by the rents (U_DRAM and U_PCM, respectively). The only tunable parameters then are the rents, which determine the memory usage for each DRAM and PCM. The following figure (which assumes that U_DRAM is sufficiently larger than U_PCM) illustrates the decision, based on rt_fwd(b):

Screen Shot 2016-01-28 at 5.27.06 PM.png

Update: an alternative diagram, showing the cost of keeping an item in each type of memory vs. the forward reuse distance (now including compressed DRAM):

Since the only free parameters are U_DRAM and U_PCM, this is equivalent to having two separate values of τ, τ_DRAM and τ_PCM in the original VMIN policy. The WS policy can be adapted by simply choosing a WS parameter for each DRAM and PCM.

If DRAM compression is an option, we can quantify the cost of storing an item in compressed form as rt_fwd(b) * U_DRAM \ [compression_ratio] + R_{H,DRAM_compressed}.

January 27, 2016 Chen Ding

LLVM Tutorial by John Criswell

John’s slides

Instruction counting assignment in GCC/LLVM

January 25, 2016 rahman lavaee

CS255 Assignment #1 (LVN)

In this assignment, you are asked to implement local value numbering (LVN) and check for redundant expressions.

You are expected to handle commutativity for commutative operations. Recall that an operation is commutative if you can change the order of operands without changing the result. For example (+) is commutative but (-) is not. Your implementation must be able to assign the same value number to a+b and b+a.

As the final requirement, improve LVN by adding the Stewart extension. The Stewart extension improves LVN by identifying additional redundancy in the following example form.

a = b + c

d = a – b

Specifically, it guides LVN to assign the same value number to both (c) and (d). The idea of the solution was first raised by Chris Stewart when he was taking the class around 2004. His idea was to insert additional value relations into the value number table. You should first work out this idea and make it concrete as an extension to the basic value numbering algorithm.

Note 1: You are expected to apply the Stewart extension on four operations: ‘+’, ‘-‘, ‘*’, and ‘/’.

Note 2: You should make sure that the Stewart extension can be applied on the following code as well.

a = b + c

e = a

d = e – b

To complete this assignment, take the following steps:

1. From Blackboard, download block.rb and vn_tests.rb to your project directory.

2. Implement the LVN class in the new file vn.rb.

4. Make sure all the tests in vn_tests.rb pass.

5. Document any test failures, if there is any, and explain why, in README.txt in plain text.

6. Extra credit. In addition to finding the statements with a redundant expression, generate optimized code where all redundant expressions are removed. Demonstrate the optimizer with a set of tests and put them in opt_tests.rb. The tests should include all three tests in vn_tests.rb.

7. Submit your assignment on Blackboard. Make sure to include all the ruby files in your submission.

Due time: Friday Jan 29th at 23:59:59 EST.

Late submission policy: Each student can have a total of two days used for late submissions over all assignments . This means that if you submit the LVN assignment on Sunday, you will not be able to do any other late submission. But if you submit on Saturday, you still have one more day to use for one other assignment.

Policy on academic honesty : Every line of code of the LVN analyzer and optimizer must be written by the student. Do not copy code. Do not show your LVN code to others until 2 days (48 hours) past the assignment due time. The teaching staff is required to report every violation of the policy or suspicion of violation to the university’s Academic Honesty Board. Contact the teaching staff if you have questions on the policy.

	CSC 258/458 Parallel… on Exploring Parallel and Distrib…
	CSC 253/453 Fall 202… on CSC 253 Collaborative Software…
	rahman lavaee on Chen Ding Receives Lewis Colle…
	dcompiler on Choosing Distinguishable Color
	CS 255/455 Spring 20… on Assignment 5: Def-Use Cha…