Sources#
Summary#
Anthropic's 400K-session study supplies the empirical shape of human–agent collaboration in agentic coding: people decide what to build; the agent decides how. Measured by a privacy-preserving decision-attribution classifier, in a typical Claude Code session the user makes about 70% of the planning decisions (what to do, which approach, what counts as done) but only about 20% of the execution decisions (which files to change, what code to write, which commands to run). This is the clean, quantified version of the role-inversion the rest of the corpus describes qualitatively — coding stops being the human's job, the human becomes an allocator/director, thinking is delegated, understanding is retained.
Evidence note.
empirical, with the same first-party caveat as Returns to Expertise in Agentic Coding: Anthropic measuring its own product via Clio + Sonnet-4.6 classifiers, validated against telemetry, excluding headless/SDK/IDE usage. Decision attribution is transcript-inferred.
Two lenses: decisions and actions#
The study separates who decides from how much gets delegated:
- Decisions (content). The classifier lists every meaningful decision, splits it into planning vs execution, and attributes each to the user or Claude. Result: ~70% of planning is human, ~80% of execution is Claude's. A clean division of labor, not a blur.
- Actions (structure). A session is a back-and-forth: the user prompts, Claude goes off and acts. A typical session is ~4 turns; each user prompt sets off a chain of ~10 Claude actions on average (reading files, editing code, running commands), writing ~2,400 words per turn. The tail is long — ~2% of sessions average >100 actions per prompt.
The two lenses lock together: how much Claude does between check-ins tracks who controls planning. When the user keeps execution control (>80% of execution decisions), Claude takes fewer actions per turn (~8). When Claude controls planning (>80% of planning decisions), it runs the longest chains (~16 actions). Delegating the plan is what lengthens the leash — and per Returns to Expertise in Agentic Coding, domain expertise is what lets a user safely hand over a longer one (novice ~5 → expert ~12 actions/prompt).
The tension with "AI as primary author"#
This is the most interesting cross-source juxtaposition in the wiki, because the two numbers look contradictory until you separate the units:
- Faros: AI authors ~60% of accepted code, and the assistant→author threshold was crossed "without a deliberate decision."
- This study: humans still make ~70% of planning decisions and ~80% of execution is Claude's.
They are not in conflict — they measure different things. Faros counts lines of code authored (an execution-layer metric); Anthropic counts decisions attributed (separating planning from execution). Reconciled: Claude writes most of the lines (execution) while humans still own most of the planning decisions. "AI is the author" and "humans decide what to build" are simultaneously true. The genuine open worry survives the reconciliation, though: Faros's "without a deliberate decision" and this study's 80%-execution-to-Claude both describe a quiet drift, and the rubber-stamping risk is whether nominal human planning control hollows out into approval-by-default.
Capability ceiling vs. realized autonomy#
The report is careful to distinguish what models can do from what users let them do. METR's time-horizon evaluations measure the ceiling — frontier models can now complete tasks that would take a person many hours, working through obstacles autonomously. The decision-attribution and actions-per-prompt measures here capture the realized division in actual sessions: even with a high and rising ceiling, the typical user keeps planning control and grants execution. The gap between ceiling and realized autonomy is itself a variable to watch — if planning increasingly shifts to Claude as the ceiling rises, that is the harness shrinking on the human-decision axis.
Connections#
- Implementation Abundance Inverts Product Work — the process the division reorganizes: humans curate/decide, agents execute the abundant builds
- Role Averaging, Not Role Elimination — the team-structure reorganization built on top of "humans decide what, agents decide how"
- AI as Primary Author — the apparent contradiction (60% authorship vs 70% human planning) resolved by separating line-authorship from decision-attribution
- Compute Allocator — "humans make the planning decisions" is exactly Thariq's allocator role: deciding what's worth doing while the model produces
- Verification as the New Bottleneck — if the human owns planning + verification and Claude owns execution, the human's judgment throughput is the binding constraint
- Returns to Expertise in Agentic Coding — expertise is what lets a user safely delegate planning and unlock the longer (16-action) chains
- Task Time-Horizon Scaling — the capability ceiling (what models can do alone) vs. this study's realized autonomy (what users actually delegate)
- Harness Shrinkage as Models Improve — the share of planning delegated to the agent is a usage-side reading of the shrinking harness
- Outsource Your Thinking, Not Your Understanding — "decide what / agent decides how" is thinking outsourced, understanding (the planning) retained
- Claude Code — the surface the division is measured on
- Conversation-to-Delegation Shift — the cross-population realized-autonomy data: how much work users actually delegate to Codex (16.5% / 63.3% / 99.8% token share) is this division measured at adoption scale
- Parallel Agent Orchestration — the human keeping the planning/coordination role while execution fans out across many concurrent agents — the division of labor at fleet scale
Open questions#
- Does the human share of planning decisions fall over time as models improve (the ceiling rising into the planning layer), or is ~70% a stable human floor?
- "Decision attribution" is inferred from transcripts. When Claude proposes a plan and the user assents, is that scored as the user's planning decision or Claude's? The rubber-stamping boundary is exactly where the measure is hardest.
- Headless/SDK/pipeline usage (excluded here) is where execution autonomy is highest and planning is front-loaded into a single prompt — does the 70/20 split survive there, or collapse toward full delegation?
Sources#
- Agentic coding and persistent returns to expertise — §"The division of labor", §"Who decides what"
Cited by 14
- AI as Primary Author
Faros 2026: the assistant→author threshold crossed without a deliberate decision, marked by AI-code acceptance rising 2…
- Anthropic
AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs r…
- Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
- Compute Allocator
The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding…
- Conversation-to-Delegation Shift
OpenAI's Codex usage study (June 2026): the move from conversational AI ('asking') to agentic AI ('delegated production…
- Implementation Abundance Inverts Product Work
Andrew Ambrosino's inversion thesis: when talking to a frontier model can stand up any feature from scratch, implementa…
- AI Engineering & Agent Tooling
Map of Content for the ai-engineering domain — 45 concepts. Curated entry point; see Home for all domains.
- Open Questions Backlog
_124 pages with open questions, as of 2026-06-19._
- Outsource Your Thinking, Not Your Understanding
"You can outsource your thinking but not your understanding"; understanding as the non-delegable human bottleneck; know…
- Parallel Agent Orchestration
OpenAI Codex study's concurrency + runtime margins: the intensive-user workflow where a human oversees a team of agents…
- Research Taste as the Human Bottleneck
The narrowing human role as AI absorbs execution: choosing which problems matter, which results to trust, and when an a…
- Returns to Expertise in Agentic Coding
Anthropic's 400K-session study: domain expertise (not coding skill) is what amplifies an agent — experts get 2× the act…
- Role Averaging, Not Role Elimination
Andrew Ambrosino's nuanced OpenAI-side take on role collapse: your role is 'the average of what you spend your time on'…
- Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…
Related articles
- Engineer PM Convergence
Generalists across disciplines; product taste as bottleneck skill; Anthropic Claude Code team as case study; "just do t…
- Compute Allocator
The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding…
- Returns to Expertise in Agentic Coding
Anthropic's 400K-session study: domain expertise (not coding skill) is what amplifies an agent — experts get 2× the act…
- Agentic Coding Work-Composition Shift
Anthropic's 400K-session telemetry, Oct 2025→Apr 2026: as models improved, the share of sessions fixing broken code fel…
- Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
