Parallel Matrix Multiplication in Rayon Rust

Shaotong Sun

This post is written for an assignment for CSC 252 based on my tutorial given at ACM Chapter workshop titled “General Introduction of Parallel Programming Schemes in Different Languages.”

Rust, like C or C++, is a system-level programming language, but unlike C and C++, it has more features on memory safety issues and useability. In other words, it “gives you the option to control low-level details (such as memory usage) without all the hassle traditionally associated with such control.”1

For installation of the Rust, please refer to https://doc.rust-lang.org/book/ch01-01-installation.html.

Ownership in Rust

As mentioned above, Rust language is designed to focus on memory safety issues. To this end, Rust uses something called Ownership. “Ownership is Rust’s most unique feature and has deep implications for the rest of the language. It enables Rust to make memory safety guarantees without needing a garbage collector.”2 On the highest level, ownership means that some variable owns some value, and the can only be one owner for a value at a time. Specifically, the Rust Book defines the ownership rules as:

Ownership Rules

  • Each value in Rust has an owner.
  • There can only be one owner at a time.
  • When the owner goes out of scope, the value will be dropped.
https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html

This ownership concept not only solves the memory safety issues but also makes writing concurrent programs much more accessible than expected. Having ownership makes issues such as data race compile-time errors rather than runtime errors in many cases.

Rayon in Rust

Like OpenMP and OpenCilk in C and C++, Rayon is a data-parallelism library in Rust that helps you write parallel code safely and quickly, which eases you from manual manipulation of threads.

There are two ways to use Rayon:

  • High-level parallel constructs are the simplest way to use Rayon and also typically the most efficient.
    • Parallel iterators make it easy to convert a sequential iterator to execute in parallel.
    • The par_sort method sorts &mut [T] slices (or vectors) in parallel.
    • par_extend can be used to efficiently grow collections with items produced by a parallel iterator.
  • Custom tasks let you divide your work into parallel tasks yourself.
    • join is used to subdivide a task into two pieces.
    • scope creates a scope within which you can create any number of parallel tasks.
    • ThreadPoolBuilder can be used to create your own thread pools or customize the global one.
https://docs.rs/rayon/latest/rayon/

This tutorial will only cover the most basic method of parallelizing matrix multiplication using Rayon, which is using parallel iterators (more can be found at: https://docs.rs/rayon/latest/rayon/).

The most naive way of matrix multiplication in Rust is shown below:

fn seq_mm(a: &[f64], b: &[f64], c: &mut [f64], n: usize) {
    for i in 0..n {
        for j in 0..n {
            for k in 0..n {
                c[i * n + j] += a[i * n + k] * b[k * n + j];
            }
        }
    }
}

By using par_chunks_mut, we can divide the matrix c into n distinct, separate sections, with each section smaller than or equal to n. Rayon then automatically processes these sections in parallel for us.

fn par_mm(a: &[f64], b: &[f64], c: &mut [f64], n: usize) {
    c.par_chunks_mut(n)
        .enumerate()
        .for_each(|(i, c_row)| {
            for j in 0..n {
                for k in 0..n {
                    c_row[j] += a[i * n + k] * b[k * n + j];
                }
            }
        });
}

With n=1000 and no other optimizations (meaning cargo run), the sequence matrix multiplication takes around 7 seconds to finish, while the parallel matrix multiplication takes only around 2 seconds.

Conclusion

In conclusion, Rayon in Rust seems to be beginner-friendly and easy to use, producing a relatively good performance without much work. If you are interested in Rayon or Rust, be sure to check out the Rust Book.

  1. https://doc.rust-lang.org/book/ch00-00-introduction.html ↩︎
  2. https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html ↩︎

CSC 252/452 Computer Organization (Spring 2024)

Chen Ding, Professor of Computer Science
WFs 3:25pm to 4:40 Gavett 206

CSC 252 teaches the fundamentals of modern computer organization, including software and hardware interfaces, assembly languages and C, memory hierarchy and program optimization, data parallelism and GPUs. It shows the underlying physical reality which the virtual world including AI is built and depends on.

Textbooks

Introduction to Programming with RISC-V by Borin, https://riscv-programming.org/book/riscv-book.html, required: §1 to §7.

Computer Systems: A Programmer’s Perspective 3rd Edition by Bryant and O’Hallaron, required: §1, §4.1-4.4, §5 to §12.

For all other information, see Blackboard.

Except for using RISC-V as the main assembly language (rather than x86), the course is similar to the Spring 2023 course taught by Professor Yuhao Zhu. The previous year page also includes a set of past exams, problem sets, and their solutions.

Two examples of “modular” math

Tangential to CSC 253/453 on software design but fun to explain when there is enough time in a lecture are these two math problems which are general composite properties that can be proved easily using simple building blocks.

Theorem: If a complex value is a root of a polynomial of real-valued coefficients, so is its conjugate.

To prove this for ALL such polynomials, we need just two properties of complex number arithmetic: (1) the conjugate of the sum is the sum of the conjugates, and (2) the conjugate of the product is the product of the conjugates. These are binary operations and can be shown easily. Then for any real-valued polynomial f(x), we have f(~x) = ~f(x) = ~0 = 0, and the theorem is proved.

Theorem: In a triangle, a median is a line from an end point to the center of the opposite side. For any triangle, its three medians meet at one point which is 1/3 the way from the end point to the edge.

To prove this for ALL triangles, we use a property of a single line segment. Let P be a mid-point on the line segment P1P2. Let the (complex-plane) coordinates of P1, P2, P be x, y, z, and the ratio r = P1P / PP2, then we have z = (x + ry) / (1 + r). If we use this equation to compute the coordinate of the 1/3-way point of the three medians, we’ll see that they are identical: 1/3(x1+x2+x3), where xs are the coordinates of the three end-points of the triangle.

Source: An Imaginary Tale — The Story of i, by Paul J. Nahin, Princeton U Press, 1998.

Photo credit: AI generated by Kaave Hosseini for CSC 484 for “dimension reduction”

CSC 253/453 Collaborative Software Design (Syllabus, Fall 2023)

Chen Ding, Professor of Computer Science
MWs 3:25pm to 4:40 Hylan 202

Modern software is complex and more than a single person can fully comprehend. This course teaches collaborative programming which is multi-person construction of software where each person’s contribution is non-trivial and clearly defined and documented.  The material to study includes design principles, safe and modular programming in modern programming languages, software teams and development processes, design patterns, and productivity tools.  The assignments include collaborative programming and software design and development in teams.  The primary programming language taught and used in the assignments is Rust. Students in CSC 453 have additional reading and requirements.

Principles

  • Essential Difficulties: Complexity, Conformity, Changeability
  • Module Criteria
  • The Modular Structure of Complex Software
  • Design and Development of Program Families
  • Designing for Software Extension and Contraction

Rust

  • Programming without Loops and Branches: Iterators, Closures
  • Error Handling: Option, Result
  • Code Reuse: Generic Type, Trait, Trait Bound
  • Memory Safety: Ownership, Borrow, Lifetime, Smart Pointer

Software Design

  • Distributed Version Control
  • Behavioral Design Patterns: Command, New Type, RAII Guards, Strategy
  • Creational Design Pattern: Builder
  • Trait Object and State Pattern
  • Meta Programming
  • Logging and Serialization

Software Engineering

  • Team
  • Unified Software Development Process
  • Testing
  • Code Review

Human Values

  • Apportionment
  • Algorithmic Fairness
  • Fallibility and Truth Seeking

Past Students’ Comments

“Separation of concern is perhaps my favorite topic in software development right now; I love making software as modular and reusable as possible. Taking CSC 253 also helped me to understand the MVC architecture in mobile app development class almost immediately.”  (Fall 2022)

“A huge part of the course is graded on a complete group project. You’re assigned a random group, and you better pray to get group members who show up to class and do their parts.”  (Feb. 2023)

“The lessons on iterators truly opened my eyes to a whole new world of thinking about programming, and thinking about modules helped me understand the concept of information hiding and team collaboration, and especially communication and just how important it is. I will be bringing my learnings from your class to Seattle this summer for sure!”  (Fall 2022)

“The most meaningful part is doing the final project – DVCS in group with other 4 outstanding classmates. In this project, I learned how Git works, how to apply the design principles into practice, and how to collaborate well with others in programming. The reward didn’t show up immediately when and after the class, but afterward when I looked for an SDE job and prepared for the interviews, I was reminded of what I learned in the CSC453 course and found out how useful it is to my career.”  (Fall 2021)

On Rust

“Speaking of languages, it’s time to halt starting any new projects in C/C++ and use Rust for those scenarios where a non-GC language is required. For the sake of security and reliability. the industry should declare those languages as deprecated.”  – Mark Russinovich, CTO of Microsoft Azure, author of novels Rogue Code, Zero Day and Trojan Horse, Windows Internals, Sysinternals tools, author of novels Rogue Code, Zero Day and Trojan Horse, Windows Internals, Sysinternals tools, 9/19/2022

Exploring Parallel and Distributed Programming: Student Presentations Showcase Projects

In the Spring 2023 semester, a group of Parallel and Distributed Programming (CSC 248/448) students showcased their remarkable research and implementations in a series of presentations. Their projects span a wide range of fields, from optimization algorithms to parallel computing frameworks. Here is some brief Information about their presentations.

  1. Aayush Poudel: Ant Colony Optimization (ACO) for the Traveling Salesman Problem (TSP)
    • Aayush Poudel’s presentation revolved around the fascinating application of Ant Colony Optimization to solve the Traveling Salesman Problem.
  2. Matt Nappo: GPU Implementation of ACO for TSP In his presentation
    • By harnessing the parallel processing capabilities of GPUs, Matt demonstrated an efficient implementation of ACO for the Traveling Salesman Problem.
  3. Yifan Zhu and Zeliang Zhang: Parallel ANN Framework in Rust
    • Yifan Zhu and Zeliang Zhang collaborated on a project that involved building a parallel Artificial Neural Network (ANN) framework using the Rust programming language. Their framework leveraged the inherent parallelism in neural networks, unlocking increased performance and scalability.
  4. Jiakun Fan: Implementing Software Transactional Memory using Rust
    • Jiakun Fan delved into concurrency control by implementing Software Transactional Memory (STM) using the Rust programming language. STM provides an alternative approach to traditional lock-based synchronization, allowing for simplified concurrent programming. Jiakun’s project showcased the feasibility of utilizing Rust’s unique features to build concurrent systems.
  5. Shaotong Sun and Jionghao Han: PLUSS Sampler Optimization
    • Shaotong Sun and Jionghao Han collaborated on a project to optimize the PLUSS sampler. Their work involved enhancing the performance and efficiency of the sampler through parallelization techniques.
  6. Yiming Leng: Survey Study of Parallel A*
    • Yiming Leng undertook a comprehensive survey study exploring the parallelization of the A* search algorithm. A* is widely used in pathfinding and optimization problems, and Yiming’s research focused on the potential benefits and challenges of parallelizing this popular algorithm.
  7. Ziqi Feng: Design and Evaluation of a Parallel SAT Solver
    • Ziqi Feng’s presentation concerned designing and evaluating a parallel SAT (Satisfiability) solver. SAT solvers play a crucial role in solving Boolean satisfiability problems, and Ziqi’s project aimed to enhance their performance by leveraging parallel computing techniques.
  8. Suumil Roy: Parallel Video Compression using MPI
    • Suumil Roy’s project focused on leveraging the Message Passing Interface (MPI) for parallel video compression. Video compression is crucial in various domains, including streaming and storage. By leveraging the power of parallel computing, Suumil demonstrated how MPI enables the efficient distribution of computational tasks across multiple processing units.
  9. Muhammad Qasim: A RAFT-based Key-Value Store Implementation
    • Muhammad Qasim’s presentation focused on implementing a distributed key-value store using the RAFT consensus algorithm. Key-value stores are fundamental data structures in distributed systems, and the RAFT consensus algorithm ensures fault tolerance and consistency among distributed nodes.
  10. Donovan Zhong: RAFT-based Key-Value Storage Implementation
    • Donovan Zhong’s project complemented Muhammad’s work by presenting another RAFT-based key-value storage implementation perspective. Donovan’s implementation provided insights into the challenges and intricacies of building fault-tolerant and distributed key-value storage systems.
  11. Luchuan Song: Highly Parallel Tensor Computation for Classical Simulation of Quantum Circuits Using GPUs
    • Luchuan Song’s presentation unveiled an approach to parallel tensor computation for the classical simulation of quantum circuits. Quantum computing has the potential to revolutionize various industries, but its simulation on classical computers remains a challenging task. Luchuan’s project harnessed the power of Graphics Processing Units (GPUs) to accelerate tensor operations, allowing for efficient and scalable simulation of quantum circuits.
  12. Woody Wu and Will Nguyen: Parallel N-Body Simulation in Rust Programming Language
    • Working together as a team, Woody Wu and Will Nguyen tackled the intricate task of simulating N-body systems. N-body simulations involve modeling the interactions and movements of particles or celestial bodies, making them essential in various scientific domains. In collaboration, they presented their project using various parallel programming frameworks such as Rust Rayon, MPI, and OpenMP. By leveraging these powerful tools, they explored the realm of high-performance computing to achieve efficient and scalable simulations.

The presentation slides can be found at https://github.com/dcompiler/258s23

CSC 579 Machine-Checked Proofs Using Coq

CSC 579 Spring 2023
(R 9:40am to 10:55 Lechase 103)

Syllabus

  • The Need for Training Thought: The Values of Thought. Tendencies Needing Constant Regulation.  Regulation Transforms Inference into Proof.
  • Type Systems.  Operational Semantics. Progress. Type Preservation. Type Soundness.
  • Functional Programming in Coq: Data and Functions.  Proof by Simplification, Rewriting and Case Analysis.
  • Proof by Induction. Proofs Within Proofs.  Formal vs. Informal Proof.
  • Lists, Options, Partial Maps.
  • Basic Tactics: apply, apply with, injection, discriminate, unfold, destruct.
  • Logic in Coq. Logical Connectives: Conjunction, Disjunction, Falsehood and Negation, Truth, Logical Equivalence, Logical Equivalence, Existential Quantification.  Programming with Propositions. Applying Theorems to Arguments. Coq vs. Set Theory: Functional Extensionality, Propositions vs. Booleans, Classical vs. Constructive Logic.
  • Inductively Defined Propositions. Induction Principles for Propositions.  Induction Over an Inductively Defined Set and an Inductively Defined Proposition.
  • The Curry-Howard Correspondence. Natural Deduction. Typed Lambda Calculus. Proof Scripts. Quantifiers, Implications, Functions. Logical Connectives as Inductive Types.

Jonathan Waxman UG Honor Thesis: Leasing by Learning Access Modes and Architectures (LLAMA)

Leasing by Learning Access Modes and Architectures (LLAMA)
Jonathan Waxman

November 2022

Lease caching and similar technologies yield high performance in both theoretical and practical caching spaces. This thesis proposes a method of lease caching in fixed-size caches that is robust against thrashing, implemented and evaluated in Rust.

CSC 253/453 Fall 2022

Collaborative Programming and Software Design

Prerequisites: CSC 172 or equivalent for CSC 253.  CSC 172 and CSC 252 or equivalent for CSC 453 and TCS 453.

Modern software is complex and more than a single person can fully comprehend. This course teaches collaborative programming which is multi-person construction of software where each person’s contribution is non-trivial and clearly defined and documented.  The material to study includes design principles, safe and modular programming in modern programming languages including Rust, software teams and development processes, design patterns, and productivity tools.  The assignments include collaborative programming and software design and development in teams.  Students in CSC 453 and TCS 453 have additional reading and requirements.

Syllabus

  • SOFTWARE DESIGN
    • Essential Difficulties
      • Complexity, conformity, changeability, invisibility. Their definitions, causes, and examples. Social and societal impacts of computing.
    • Module Design Concepts
      • Thinking like a computer and its problems. Four criteria for a module. Modularization by flowcharts vs information hiding. The module structure of a complex system. Module specification.
    • Software Design Principles
      • Multi-version software design, i.e. program families. Stepwise refinement vs. module specification. Design for prototyping and extension. USE relation.
    • Software Engineering Practice
      • Unified software development process, and its five workflows and four phases. CMM Maturity.
    • Teamwork
      • Types of teams: democratic, chief programmer, hybrids. Independence, diversity, and principles of liberty. Ethics and code of conducts.
  • PROGRAM DESIGN USING RUST
    • Safe Programming
      • Variant types and pattern matching. Slices. Mutability. Ownership, borrow, and lifetime annotations. Smart pointers.
    • Abstraction and Code Reuse
      • Iterators. Error processing. Generic types and traits. Writing tests. Modules, crates, and packages. Design patterns: iterators, builders, decorators, newtype, strategy, rail, state. Meta-programming.
    • Meta-programming (453 Only)
      • Declarative and procedural macros. Derived traits.
    • Software Tools
      • Distributed version control. Logging. Serialization.

The full course information and material are released through learn.rochester.edu.

A summary of the student evaluation for the 2021 course.

CSC 253 Collaborative Software Design Rate My Professor Chen Ding

University of Rochester Computer Science

CSC 253/453 and TCS 453 Collaborative Programming and Software Design

Modern software is complex and more than a single person can fully comprehend. This course teaches collaborative programming which is multi-person construction of software where each person’s contribution is non-trivial and clearly defined and documented.  The material to study includes design principles, safe and modular programming in modern programming languages, software teams and development processes, design patterns, and productivity tools.  The assignments include collaborative programming and software design and development in teams.  The primary programming language taught and used in the assignments is Rust. Students in CSC 453 have additional reading and requirements.

Prerequisites: CSC 172 or equivalent for CSC 253 and TCS 453.  CSC 172 and CSC 252 or equivalent for CSC 453.  

Fall 2020 Student Evaluation

Anonymous inputs were collected by the university before the final exam. 14 out of 33 students (44%) submitted the evaluation.

The overall Instructor Rating is 4.21 and Course Rating 4.00.

Among the individual questions, the highest are 4.79 (The instructor was willing to listen to student questions and/or opinions), 4.71 (The instructor demonstrated sincere respect for students), and two COVID related questions both are 4.54 (the instructor clearly articulated course expectations to students, and the instructor noticed when students did not understand course material and adjusted accordingly). The lowest are 3.71 (The exams/assignments were clearly worded) and 3.93 (The instructor used examples that helped with understanding the material).