University of Rochester Takes Lean Kernel Arena Top Spot

Lean Kernel Leaderboard

The Lean Kernel Arena (arena.lean-lang.org) is a public benchmarking and testing ground for different proof checkers (kernels) used in the Lean Theorem Prover. It runs a large suite of tests to check the correctness of each checker and reports their speed (⏱️) and memory usage (🧠), especially on large, real-world tests like mathlib (the comprehensive Lean mathematical library). Below is a screenshot of the leaderboard taken on April 29, 2026.

The public ranking includes the official kernels alongside community alternatives based on nanoda (implemented in Rust), lean4lean (written in Lean itself), and rpylean (using just-in-time compiled Python). My PhD student, Yifan Zhu, and I began discussing cache optimization for nanoda a week ago. By Monday morning of this week, Yifan’s version of nanoda had reached the top spot! The techniques he used will be the subject of an upcoming technical report. In this blog, I am merely reporting the numbers, with some context for readers who may not be familiar with Lean.

The leaderboard reports performance in total time, computed by dividing the total instruction count by a clock frequency. The instruction count does not consider the effect of caching. In addition, in multi-threaded kernels, the end-to-end time may be reduced significantly from parallelization. Yifan’s version shows only a 4% reduction in reported time, but in wall-clock time, it is substantially faster. On the benchmark machine running four threads, the wall-clock times are 4.9 minutes for Yifan’s kernel, 6.3 minutes for the second fastest, and 22 minutes for the official version. His Lean kernel is 22% faster than the second‑fastest kernel on the list.

Lean Applications

  • Formalizing mathematics. Lean has been used to formalize mathematical proofs, from undergraduate topics (calculus, linear algebra) to ongoing research. An April 2026 analysis counted over 308,000 declarations, 8.4 million logical connections, and almost 1.4 million lines of Lean code.
  • AI: Groups like Google DeepMind (with its AlphaProof system) use Mathlib as a benchmark and a knowledge base. AI models are trained on Mathlib’s data to learn how to reason and prove theorems automatically.

The most computationally demanding use cases are in the field of artificial intelligence, where AI models require generating and checking millions of potential proof steps. A faster kernel directly reduces the time and energy required to train and run these AI models. Beyond training AI, even standard automated proof tactics, e.g., simp (simplify) repeatedly applying rewriting rules to a goal, rely on kernel speed. With the rise of AI coding agents, the ability to quickly verify AI-generated code is crucial. An experimental AI agent was able to generate a 7,400-line verified embedded DSL over a single weekend, requiring the kernel to check thousands of lines of code for correctness on the fly (slides).

In industrial applications, speed is not just about cost-efficiency but about integrating verification directly into the development workflow. For example, AWS uses Lean to verify its Cedar policy system, ensuring no version ships unless all proofs pass. Microsoft uses Lean to verify its SymCrypt cryptographic library.

C++ vs Rust

I downloaded and counted the number of source lines in two kernels. The first is from https://github.com/leanprover/lean4.git in its src/kernel, counted using cloc:

This second is from https://github.com/ammkrn/nanoda_lib.git:

In reported measures, the official C++ kernels take 30 to 40 minutes and use over 9 GB of memory. The Rust-based kernels, in contrast, take as little as 16.1 minutes in Yifan’s version and use half as much memory. In parallel (4-thread) executions, the wall-clock time is 22 minutes for the official version, compared to 4.9 and 6.3 minutes for the top two Rust-implemented kernels. The parallel speedup, computed by the ratio of the reported time to the wall-clock time, is less than 2× for the C++ kernels and 3.3× for Yifan’s kernel. Overall, the Rust kernels execute fewer instructions and achieve greater parallel efficiency than their C++ counterparts.

* Corrected 5/12/2026: the time reported by the arena website is computed from the total instruction count, not based on measured CPU time.

Miss Ratio Monotonicity and Convexity (Part 1)

Caches are dynamically managed local memories. Their dynamic behavior can either improve performance or, in some cases, become detrimental and counterproductive. Monotonicity and convexity are two properties are most commonly used to determine whether a cache design is well‑behaved and provides predictable performance. This is the first of a series blog posts on these properties.

Miss Ratio Monotonicity

Informally:

  • If you increase the cache size, the miss ratio never increases.
  • In other words, the miss ratio is a non‑increasing function of cache size.

Formally:
Let mr(c) be the miss ratio for a cache of size c (measured in blocks). Then for c_1 < c_2, we have mr(c_1) \ge mr(c_2).

Not all caches behave this way. If a cache algorithm is not monotonic, increasing cache size could sometimes increase misses, which is counterintuitive and undesirable. This is known as the Belady’s anomaly.

Stack Algorithms and the Inclusion Property

A caching algorithm has the inclusion property (or is a stack algorithm) if:

For any memory address reference sequence and at any time, the set of blocks in a cache of size c is a subset of the set in a cache of size c+1. Cache contents are inclusive across sizes. A block present in the smaller cache is always present in any larger cache. Therefore, the miss ratio of any stack algorithm is guaranteed non-increasing when increasing the cache size.

Mattson et al. Presented the stack property as the sufficient condition for one-pass evaluation of a caching algorithm. The Stack Simulation maintains the content of all cache sizes at each moment using a stack. At each access, the stack distance is the position of the stack where the accessed data is found. The stack distance fully determines whether the data access is a cache hit or miss.

After stack simulation, the hit or miss count of any cache size can be determined from the stack distances without re‑running the trace. Hence, the paper is titled Evaluation techniques for storage hierarchies.

The classic paper in 1970 established formally that the stack property is a sufficient condition for monotonic miss ratios, and the practical solution of one-pass evaluation. This classic work has been extended later in several ways.

LRU caches are the most important and commonly studied. For example, program locality analysis usually assumes LRU caches. The LRU stack distance shows the “closeness” of data reuse and is often abbreviated as the reuse distance. Stack simulation is too slow for large traces. Much faster algorithms have been developed. See a later blog devoted to studies of reuse distance.

Caching techniques that guarantee monotonic miss ratios:

  • Mattson et al. IBM 1970: LRU, OPT, MRU, LFU, and RR (statistically equivalent to RAND)
    • OPT is optimal for all cache sizes, while the technique for a single cache size is called Belady or MIN
  • Gu and Ding, ISMM 2011: LRU-MRU, used for collaborative caching with binary hints, i.e., the evict-me bit.
  • Gu and Ding, ISMM 2012: priority hints, Priority LRU, and non-uniform inclusion.

The term “stack distance” is often used without specifying which type. All stack algorithms above have their stack distance.

Caching techniques that are not guaranteed to have monotonic miss ratios:

  • Belady, CACM 1969: FIFO
  • Mattson et al. IBM 1970: RAND

As explained in Mattson et al., non-monotonic miss ratios may happen if caching priorities depend on the capacity of the cache and differ from one capacity to another, for example, priorities depending on the frequency of reference to pages after their entering the cache. Another example is when priorities depend on total time spent in the cache.

Summary

Miss ratio monotonicity means larger cache → same or lower miss ratio
Inclusion property ensures miss ratio monotonicity and allows for single-pass evaluation, i.e., you can compute miss ratio curve (MRC) for all cache sizes from one trace run.

Programming Language Skills Are AI Skills

Because of AI, everyone needs to know programming languages


Jesen Huang in 2025

Jensen Huang famously said that with modern AI, the programming language becomes human language. Indeed, what used to take days to craft a program, AI tools can now write in seconds. Reports from the industry indicate that many senior engineers have not written a single line of code in months. However, AI coding is directed and managed by humans, and AI-generated code must be approved by humans. Code is cheap, but good code is not. In fact, more AI-generated code often means more human work. A compelling real-world example is the recent Anthropic source code leak, which highlights the risks associated with AI-assisted coding.

Fred Brooks in 1986

In his seminal article that laid the foundation for the field of software engineering, Frederick Brooks identified four essential difficulties of software development. He evaluated the solutions available at the time and concluded that while they addressed accidental difficulties, no solution could resolve the essential ones. He titled his article No Silver Bullet. The introduction begins:

Of all the monsters who fill the nightmares of our folklore, none terrify more than werewolves, because they transform unexpectedly from the familiar into horrors. For these, one seeks bullets of silver that can magically lay them to rest.

I imagine that Anthropic was probably horrified when their AI-generated system leaked over half a million lines of internal source code—a transformation from familiar tool to unexpected nightmare. Brook’s article is the required reading for the first week of the class I have been teaching. The first homework asks students to imagine talking to Jensen Huang and answering his question: “So, what are the essential difficulties of software?”

Programs Require Human Approval

Michael Scott, my colleague and the author of the popular textbook Programming Language Pragmatics, often quotes Donald Knuth: “Programming is the art of telling another human being what one wants the computer to do.” Part of this insight is that a program is a specification for a machine. In this regard, the key attribute is that a program must be precise. We should not—and must not—allow unspecified behavior by a machine. Think about programs that run laser eye surgeries, operate radiation machines, land jetliners, and control the ignition sequence to send Artemis II to the moon. We must write programs that are complete and precise.

AI has an unprecedented ability to automate programming, and its capabilities continue to improve. However, humans must be there to inspect and approve the generated code. The KPMG survey in Q3 2025 found that 63% of 130 executives said they are “putting humans in the loop due to a lack of trust, up from 45% last quarter.” In discussing the survey, one commenter explained that if people do not fully understand what a generated system does and why, they cannot defend it—not to a regulator, not to a client, and not to their leadership.

Precision is the foundation of reasoning. In mathematics, complete precision allows us to derive a result across a hundred steps while still knowing exactly what we are doing—and it enables someone to check every step along the way, a task that machines can now perform. Precision is the foundation of modern science and engineering. We can build complex systems today because we make every component fully precise.

The Conclusion: Everyone Needs to Know Programming Languages

With AI, modern programming may consist of three steps: (1) prompt AI to write code, (2) ensure the code is correct, and (3) explain to another human being what the code does and why it is correct. If Jensen Huang is correct that everyone will program using natural language, then it actually requires that everyone must know programming languages—because step (1) is fraught with risks without steps (2) and (3). Everyone needs to know programming languages enough to understand the code that they write using AI.

Reflection on the Meaning of Charity

This week, billions of people, 1.4 in mainland China, are enjoying their longest holiday of the year. It is a tradition time to reflect, mark an entry in this juncture in time, and share with others. With that spirit, here are my thoughts on the fifth day of the Year of Horse.

A recurring theme in traditional Chinese drama is the contrast between those who help others in urgent need and those who only add to the wealth of the rich. This sentiment appears in the book Slapping the Table in Astonishment (Part One), published around 1627—roughly 200 years after Zheng He’s ocean voyages. The original text reads:

“世间人周急者少,继富者多。为此,达者便说:’只有锦上添花,那得雪中送炭?’ 只这两句话,道尽世人情态。”

This translates to: “In this world, few help those in desperate need, while many add to the fortunes of the already wealthy. Thus, the wise observe: ‘There are always those who gild the lily, but who brings charcoal in the snow?’ These words capture the true nature of human relationships.”

The idea of helping others—going out on a snowy night to deliver firewood to those without heat—once defined my understanding of charity. Before coming to Rochester, I believed that was exactly what charity meant: helping others. Living in Rochester, however, my perspective has changed.

Rochester has a rich legacy. In 1960s and 70s, the “Nifty Fifty” stocks are a group of high-growth, large-cap stocks popular with institutional investors and widely regarded as “one-decision”, that is, buy and never sell. Three of these are in Rochester: Kodak, Xerox, and Bausch & Lomb. George Eastman, who founded Kodak, had a wealth of $100 million (2 billion today) and gave most of it away: $50 million to University of Rochester including Eastman music school and $20 million to MIT (anonymously as Mr. Smith). In his late years, he suffered from a health condition which was extremely painful. After touring the new River Campus of University of Rochester, accompanied by then university president Rush Rhees, he returned home and committed suicide, leaving a brief note: “To my friends: my work is done. Why wait? G.E.”

Rochester is cited as the birthplace of the modern United Way movement. “Checking a box” on a payroll deduction form to donate and support the community is a direct descendant of this tradition. If just visiting, one would not see a difference, but living here for decades, I know how unusual for a city to have a deeply ingrained culture of giving back. Since around 2010, a few years after our second child was born, my family has been donating the equivalent of one dollar per person per day to United Way. In 2025, approximately 50,000 people donated about $16.7 million to this organization alone. The funds are used to support programs run by local charitable organizations. Many people, including us, also donate directly to these groups. I know people who volunteer in these programs. For example, one of our children’s music teacher and her husband volunteer one day each week doing cleaning work at Ronald McDonald House. The picturesque house sits by the canal and a short distance from the UR Hospital, with 24 private bedrooms to house terminally ill children, at no cost to their families. I feel fortunate to be able to donate to United Way and support volunteers who help others. But this is still not the whole meaning of charity.

Today I saw a passage by Vincent van Gogh, who wrote in 1882 to his brother Theo Van Gogh (who supported him through his life and died only 6 months after Vincent died).

What am I in the eyes of most people — a nonentity, an eccentric, or an unpleasant person — somebody who has no position in society and will never have; in short, the lowest of the low. All right, then — even if that were absolutely true, then I should one day like to show by my work what such an eccentric, such a nobody, has in his heart. That is my ambition, based less on resentment than on love in spite of everything, based more on a feeling of serenity than on passion. Though I am often in the depths of misery, there is still calmness, pure harmony and music inside me. I see paintings or drawings in the poorest cottages, in the dirtiest corners. And my mind is driven towards these things with an irresistible momentum. I’m seeking. I’m driving. I’m in it with all my heart. What would life be if we had no courage to attempt anything. (based on translation used by @soulxsigh on TikTok)

Here is what charity truly means: It is not only about helping others—it is also about allowing ourselves to be inspired. Whether it is someone enduring a cold winter night without heat, someone donating their time to lift others up, or someone uncompromisingly charting their own life path, we draw inspiration from them. And we will not be the only ones who find new strength and meaning in their example. A life well lived is one filled with moments of inspiration. In this light, our charity is simply investing our wealth to live a better life—first and foremost for ourselves.

Below is van Gogh’s painting titled Landscape with a Carriage and a Train, with a horse in the center (Source: https://en.wikipedia.org/wiki/Landscape_with_a_Carriage_and_a_Train). Wishing everyone a happy time in the Year of the Horse. 马到成功 (Mǎ dào chéng gōng) — may you succeed in arriving.

Fun fact: In 2020, Rochester United Way received a record $35 million donations. The unusual total is due to the historic $20 million gift from MacKenzie Scott.