Howardism

Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from now" claim; mechanical verification stays load-bearing

Sources#

Anthropic's Boris Cherny: Why Coding Is Solved, and What Comes Next
How Anthropic's product team moves faster than anyone else | Cat Wu (Head of Product, Claude Code)

Summary#

The harness — prompts, skills, scaffolding, mechanical verification — exists to compensate for what the underlying model cannot yet do. As models improve, the harness needs to shrink, not grow. Boris Cherny explicitly predicts Claude Code "may be 100 lines of code a year from now." Cat Wu reports the team reads the entire system prompt with every model launch and removes anything the new model handles natively. The principle works in two directions: capabilities the harness used to inject move into the model, and crutches the harness used to provide become drag.

The to-do list as canonical example#

Cat Wu's case study:

Early Claude Code: asked to refactor 20 call sites, the model would change 5 and stop. The team added an explicit to-do list tool ("Sid on our team was like, what would a human do? Make a list, go through one by one"). With the tool prompted aggressively, the model finished all 20.
Opus 4 onward: model uses the to-do list spontaneously, no aggressive prompting needed.
Today: to-do list is "deemphasized" — model may or may not use it, doesn't need to be reminded, mostly kept around for user-facing visibility.

The crutch (the prompt section forcing to-do list use) was removed; the tool stayed for a different reason (UI value).

The Boris claim: 100 lines#

"I think Claude Code itself may be 100 lines of code a year from now."

Read literally this is hyperbole, but the direction is real:

Anthropic now uses the same models internally that ship externally, so internal harness lessons transfer
Each model release lets the team delete prompt sections, shrink fallback logic, remove safety wrappers (per Cat Wu: "all the safety mechanisms today — prompt injection, static verification of commands, permission modes, human in the loop — will be less important because the model will just do the right thing")
The product surface stops being "what the harness does" and becomes "where the model decides to do it" (CLI, mobile, web, IDE, all sharing the same model logic)

The flip side: capabilities migrate inward#

Boris reports Opus 4.7 spontaneously starts loops:

"I'll tell it 'pull this data query.' It says 'I noticed the data is changing — I'll start a loop and report every 30 minutes.'"

The /loop primitive (see Agent Loop Pattern) was introduced as a harness feature; in 4.7 it is becoming model-native behavior. The harness primitive doesn't go away — but the user no longer needs to invoke it.

This generalizes: anything the harness teaches the model how to do via a prompt section is a candidate for migration into the next model's training data.

The wrong direction: harness bloat#

The opposite failure mode is worse than no harness — it actively degrades the model:

Cat Wu: "What models are capable of in [a one-month] timeline" is the hardest forecast for PMs; over-specifying the harness for an old model wastes tokens that the new model uses better unsupervised.
Matt Pocock: 250K-token system prompts push the model into the dumb zone before it does anything (see Context Window Smart Zone).
Repeated capability injections drift toward contradiction: rule X for case A, rule Y for case B, until the model can't tell which applies.

Process: read the system prompt at every launch#

Cat Wu's discipline:

"We read through the entire system prompt and we reflect on, okay, for each of these sections, does the model really need this reminder anymore? And if not, we'll remove it."

This is a backwards practice — most teams would only add to a prompt, not subtract. Doing it on a cadence aligned to model launches is what keeps the harness from accreting.

Build for the next model, not this one#

Counterintuitive corollary from Boris:

"We were trying to build this thing that was like pre-PMF, and we knew that it wouldn't have PMF for 6 months because we were building for the next model."

Most products are built for the model they're released against. Anthropic builds Claude Code for the model six months out — accepting it doesn't quite work today, with the bet that the next release closes the gap. This shifts what "harness work" means: not "make the current model usable" but "build a product surface that will work when the model arrives."

Cat Wu's variant: "It's pretty important to build products that don't necessarily work yet so that you know what is missing for this product to work, and then with the newest model you can just swap it in."

Counterpoint: harness still matters#

Not every voice agrees. Matt Pocock argues the harness — feedback loops, deep modules, mechanical verification — is the ceiling:

"If your code base doesn't have feedback loops, you're never ever ever going to get decent AI decent output out of AI. The quality of your feedback loops influences how good your AI can code, essentially. That is the ceiling."

The synthesis: prompt scaffolding shrinks as models improve; mechanical verification remains essential. Tests, types, linters, isolated review contexts — these are infrastructure that the harness provides and that doesn't migrate into the model the way capabilities do.

Connections#

Boris Cherny — "100 lines" claim and the spontaneous-loop observation
Cat Wu — the operational discipline of pruning prompts at every launch
Matt Pocock — counterpoint that mechanical verification stays load-bearing
Agent Loop Pattern — example of a primitive migrating from harness to model
Context Window Smart Zone — why prompt bloat is a cost, not just bloat
Claude Character as Product — character is the rare harness asset that probably doesn't shrink
Agent Harness Engineering — generalizes the "enforce invariants, not implementations" principle to harness-vs-model division of labor
Claude Code Auto Mode — a harness feature whose necessity Cat Wu predicts will fade
AI Brain Fry — partially mitigated by harness shrinkage (less to oversee), reintroduced by output volume from loops
Human-AI Accountability Redesign — what doesn't shrink is the human at the boundary; this paper names what that boundary work becomes (oversight quality, decision rights, escalation, consequences)
Model Spec Midtraining (MSM) — alignment moves from harness-prompt-injection of values to model-internalized values; the alignment side of harness shrinkage
Interaction Models — the same move on the interaction axis: VAD / turn-detection / dialog-management harnesses dissolve into the model (Thinking Machines Lab, May 2026)
The Bitter Lesson — the underlying principle: hand-crafted scaffolding gets outpaced by scaled general capability

Open questions#

Does all prompt scaffolding eventually migrate into the model, or does some remain — e.g. organization-specific style, security rules, brand voice?
The Boris "100 lines" prediction is a year out from May 2026 — testable in 2027.
If harness work shrinks, what new work expands to fill it? Cat Wu's bet: PM/product taste, eval-writing, character work.

Derived#

Learning to Co-Work with AI: A Software Engineer's Field Guide — pruning-at-every-launch framed as a daily practice; "build for next model" as career-strategic horizon
Opinions on Using AI Tools & the Future of the Software Engineering Role — the harness-shrinks vs harness-is-the-ceiling tension is one axis of the four-stance debate map

Sources#

§ end

About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

27 articles link here

ConceptAgent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
ConceptAgent Loop Pattern
`/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, p…
ConceptAI Brain Fry
Kropp et al. 2026/03: mental fatigue from excessive AI oversight increases minor errors +11%, major errors +39%; cognit…
ConceptAI Native Product Cadence
Cat Wu's 6mo→1mo→1day cadence at Anthropic: research-preview branding, mission-as-tiebreaker, evergreen launch room, li…
EssayOpinions on Using AI Tools & the Future of the Software Engineering Role
Debate map of four stances on using AI tools (bullish-insider / pragmatist-practitioner / skeptic-governance / architec…
EntityAnthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
EntityBoris Cherny
Creator of Claude Code at Anthropic; phone-driven workflow with hundreds of agents; primary advocate of `/loop` primiti…
EntityCat Wu
Head of Product for Claude Code and Cowork at Anthropic; primary articulator of AI-native product cadence and engineer-…
ConceptClaude Character as Product
Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the…
EntityClaude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
ConceptClaude Code Auto Mode
Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…
ConceptClaude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
EntityClaude Opus 4.7
GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…
ConceptContext Window Smart Zone
Smart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised…
ConceptEngineer PM Convergence
Generalists across disciplines; product taste as bottleneck skill; Anthropic Claude Code team as case study; "just do t…
ConceptHuman-AI Accountability Redesign
HBR five-pillar prescription: span-of-control redesign, role redesign, performance management reset, decision-rights/es…
ConceptInteraction / Background Model Split
Dual-model architecture: time-aware interaction model stays present; async background model handles deep reasoning/tool…
ConceptInteraction Models
Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via…
EssayLearning to Co-Work with AI: A Software Engineer's Field Guide
Field guide for software engineers in the AI era: 6 skill clusters (taste, harness, alignment-first planning, agent-fri…
EntityMatt Pocock
Independent AI-coding educator; built Sandcastle library; smart-zone/grill-me/tracer-bullets pedagogical framing; "bad…
ConceptModel Introspection Feedback
Cat Wu's underrated technique: ask the model why it failed; treat answer as harness-debugging signal not model criticis…
EntityMythos Model
Anthropic preview-tier frontier model; gated for safety; used internally alongside Opus 4.7; descendant expected to shi…
ConceptPrinting Press Software Democratization
Boris Cherny's analogy: 1400s literacy expansion → AI software-writing expansion; domain knowledge displaces coding ski…
ConceptSeven Powers Applied to AI
Helmer/Acquired framework re-evaluated for AI: switching costs and process power erode; network effects, scale, cornere…
ConceptThe Bitter Lesson
Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolv…
EntityThinking Machines Lab
AI research lab behind interaction models (May 2026); harness-dissolves-into-model thesis; upstreamed streaming-session…
ConceptTurn-Based Interface Bottleneck
Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out b…

ConceptAgent Loop Pattern
`/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, p…
EntityClaude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
ConceptAgent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
ConceptAI Native Product Cadence
Cat Wu's 6mo→1mo→1day cadence at Anthropic: research-preview branding, mission-as-tiebreaker, evergreen launch room, li…
EntityAnthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…

Harness Shrinkage as Models Improve

Sources#

Summary#

The to-do list as canonical example#

The Boris claim: 100 lines#

The flip side: capabilities migrate inward#

The wrong direction: harness bloat#

Process: read the system prompt at every launch#

Build for the next model, not this one#

Counterpoint: harness still matters#

Connections#

Open questions#

Derived#

Sources#