Howardism | Lean

Sources#

Advancing Mathematics Research with AI-Driven Formal Proof Search

Summary#

A proof assistant (interactive theorem prover) in which "definitions, theorems, and proofs are all mechanically verified code." Proofs are built by a sequence of tactics (elementary proof steps); Lean's compiler "executes" a proof tactic-by-tactic, tracking the goals still pending after each step, and a proof is correct exactly when the compiler reaches a state with no pending goals. Lean is the verification substrate for AI-Driven Formal Proof Search and DeepMind's AlphaProof Nexus — the component that turns LLM mathematical reasoning from hallucination-prone prose into a checkable artifact.

Why it matters here: the perfect verifier#

Lean is the reason formal proof search works as an AI paradigm. It is a sound, automatic, per-step verifier — which makes it the maximally-verifiable domain for Karpathy's verifiability thesis and the ideal reward/grounding signal for an agentic loop. The compiler error message after each edit directs the next turn, so the model's reasoning stays anchored to ground truth ("the power of compiler feedback in grounding LLM reasoning" — Agentic Loops Overtake Bespoke Systems).

The `sorry` tactic#

Lean's sorry tactic immediately closes any pending goal while still passing the type checker — a placeholder for "proof goes here." This is what makes the proof-sketch interface possible: a sketch is a Lean file with sorry in place of the proof, and proving a theorem reduces to generating type-safe code with no sorry (and no disallowed axioms like sorryAx). SafeVerify enforces exactly this. Failure analysis in the paper turns on sorry: agents sometimes hid the core difficulty in a single sorry inside a helper lemma restating the target, or cited "established" lemmas (left as sorry) that were hallucinations — both rejected by end-to-end verification.

mathlib and the frontier#

Lean ships with mathlib, a large community mathematics library. The paper notes successes cluster "in areas such as combinatorics, convex optimization, and number theory, where Lean's mathematics library is mature and tasks often decompose into tractable subgoals." mathlib maturity is therefore a key gate on what AI formal proof search can currently reach — domains with thin library coverage or that require extensive new theory remain out of reach. The runs used Lean v4.27 in Docker sandboxes (with Pantograph for machine-to-machine interaction).

Connections#

AI-Driven Formal Proof Search — Lean is the verifier at the paradigm's core
AlphaProof Nexus — drives Lean via a search-replace tool + compiler feedback loop
The Verifiability Thesis — Lean is the canonical maximally-verifiable domain
Agentic Loops Overtake Bespoke Systems — Lean's per-step feedback is what makes the simple loop viable
Evolutionary Proof Search — Lean's binary pass/fail forced the LLM-critic fitness workaround
Google DeepMind — the lab building Lean agents at research scale

Open Questions#

mathlib maturity gates the reachable frontier. Can AI formal proof search grow mathlib (formalize new theory) as a byproduct, expanding its own frontier?
Lean is a perfect verifier for math. Which other domains have a comparably sound automatic verifier (vs. only noisy ones like tests or LLM-judge councils)?

Sources#

Advancing Mathematics Research with AI-Driven Formal Proof Search

Lean

Sources#

Summary#

Why it matters here: the perfect verifier#

The sorry tactic#

mathlib and the frontier#

Connections#

Open Questions#

Sources#

The `sorry` tactic#