Sources#
Summary#
Two of the three "how" margins in OpenAI's Codex usage study — concurrency (running multiple agents at once) and runtime (agents working long blocks on your behalf) — which together describe the workflow at the frontier of agentic use: a human who oversees a team of agents, delegating tasks across many simultaneous workers, rather than directly performing the work. Codex's threaded interaction model makes this possible — each agent runs in a largely independent workspace, so a user need not wait for one task to finish before starting another. This page supplies the first hard adoption numbers for the role shift that Founder as Agent Orchestrator, Loop Engineering, and Managers as ICs describe qualitatively: among intensive users, Codex is "less an assistant answering requests and more a workflow system in which the user delegates, monitors, reviews, and coordinates multiple streams of work."
Evidence note.
empirical— concurrency measured as overlapping turns in different threads (>30s overlap) in the week before June 11, 2026; runtime measured as summed active turn-latency per day (gaps >30min removed as awaiting-input). OpenAI-internal is a frontier preview, not a population estimate. Cumulative daily runtime can exceed 24h because overlapping turns are summed.
Concurrency: the OpenAI / external split is stark#
Peak concurrent agents in the measured week, by population:
| Population | Zero concurrent turns | 5+ concurrent agents |
|---|---|---|
| Organizational users | 67.4% | small tail |
| Individual users | 63.9% | small tail |
| OpenAI workers | 10.7% (use a single workflow) | 28.6% |
Among external users, concurrency is "fairly minimal" — roughly two-thirds never overlap turns, and those who do mostly peak at two. Among OpenAI workers it inverts: only 10.7% run a sole workflow at any point, and 28.6% managed five or more concurrent agents. The paper calls this "fundamentally different" from external practice: it requires the human to manage, delegate to, and review the work of a relatively large group of agents — a supervisory workflow, not a hands-on one.
Runtime: long-running work concentrated at the top tail#
The duration margin shows the same median-vs-frontier gap:
- Median OpenAI employee: ~2.5 agent-hours/day (June 11, 2026). Meaningful delegated blocks, but not continuous around-the-clock execution — typical use is still intermittent.
- p99 OpenAI employee: ~71 agent-hours/day — which implies several agents running concurrently at any given hour. Up ~88% since April 7, 2026.
- External tails grew too: p99 daily runtime rose ~25% (organizational) and ~50% (individual) over the sample, but absolute levels stay far below OpenAI.
The pattern across both margins: agentic workflows remain sporadic for the typical user, but a smaller group of high-intensity users is rapidly expanding the work it delegates — and that group is overwhelmingly inside OpenAI, the frontier preview.
Why software, why now, and the inversion of the human's role#
The paper grounds parallelism in the same property that makes coding the leading edge of agentic AI: software work is digital, verifiable, and modular into many subtasks — exactly the shape that lets one person fan work across many independent agents and review the results. The consequence is a role inversion: the human stops being the executor and becomes the delegator-monitor-reviewer-coordinator of a portfolio of agentic work. This is precisely the review-and-supervision bottleneck made visible in behavior — the more agents you run in parallel, the more your throughput is gated by your capacity to review, not the model's capacity to produce (Loop Engineering's "your review bandwidth decides how many you can actually run"; AI Brain Fry's oversight-fatigue ceiling).
Connections#
- Role Averaging, Not Role Elimination — "an IC manages agents" made literal: the fleet the averaged role runs
- Conversation-to-Delegation Shift — concurrency and runtime are two of the three "how" margins (with systematization) that study uses to measure depth of delegation
- Agentic Work Systematization — the sibling margin; reusable skills are what make parallel/repeatable delegation tractable enough to run many at once
- Founder as Agent Orchestrator — the qualitative role this page quantifies: founder/worker as orchestrator of many specialized agents; here are the first concurrency/runtime adoption numbers
- Managers as ICs — running a fleet of agents is the IC-becomes-manager shift in literal form: the intensive user manages, delegates, and reviews a team of agent-workers
- Verification as the New Bottleneck — parallel fan-out is gated by the human's review capacity; concurrency makes the supervision bottleneck the binding constraint
- Loop Engineering — worktrees + sub-agents are the primitives that enable safe parallelism; "review bandwidth, not the tool, decides how many you can run" is this page's ceiling
- Multi-Agent Collective Intelligence — the architecture side (agents coordinating) vs this page's usage side (one human coordinating many agents)
- AI Brain Fry — the cognitive cost of overseeing many parallel streams; the oversight-fatigue limit on how far concurrency can scale per human
- Planning / Execution Division of Labor — concurrency is the human keeping the planning/coordination role while execution fans out across agents
- Engineer PM Convergence — the parallel-orchestration workflow is the IC-toward-manager/PM convergence shown in usage data
- Task Time-Horizon Scaling — long-running single agents (the runtime margin) sit under METR's rising reliable-task-length ceiling
- OpenAI — the lab whose internal usage is the frontier preview of high-concurrency workflows
- Codex — the threaded-interaction tool whose concurrency this measures
Open questions#
- p99 OpenAI runtime of 71 agent-hours/day is a frontier preview inside an unusually favorable environment. Does external concurrency actually trend toward it as frictions fall, or is heavy parallelism specific to model-adjacent work?
- Summed-overlap runtime can exceed 24h/day — it measures agent effort, not human attention. What is the human's actual oversight load per concurrent agent, and where does it saturate (AI Brain Fry)?
- Concurrency is measured over one week. Is 5+-agent management a stable practice or a burst around specific large tasks?
Sources#
- The Shift to Agentic AI: Evidence from Codex — §5.1 "Turn Concurrency"; §5.2 "Long-running agents"; §6 Conclusion
Cited by 13
- Agentic Work Systematization
OpenAI Codex study's 'systematization' margin: the shift from ad-hoc agent use (describe task → agent does it → done) t…
- AI Brain Fry
Kropp et al. 2026/03: mental fatigue from excessive AI oversight increases minor errors +11%, major errors +39%; cognit…
- Codex
OpenAI's agentic coding and work platform: a CLI (April 2025) plus a desktop app (built Nov 2025, released Feb 2026) bu…
- Conversation-to-Delegation Shift
OpenAI's Codex usage study (June 2026): the move from conversational AI ('asking') to agentic AI ('delegated production…
- Engineer PM Convergence
Generalists across disciplines; product taste as bottleneck skill; Anthropic Claude Code team as case study; "just do t…
- Founder as Agent Orchestrator
Founder role shift: less individual contributor, more orchestrator of specialized AI assistants; non-technical founders…
- Loop Engineering
Replacing yourself as the agent's prompter by designing the system that prompts it: a recursive-goal loop built from fi…
- Managers as ICs
Every Claude Code manager starts as an IC; flat org; agentic coding collapsed the onboarding cost that pushed managers…
- AI Engineering & Agent Tooling
Map of Content for the ai-engineering domain — 45 concepts. Curated entry point; see Home for all domains.
- Multi-Agent Collective Intelligence
DeepMind's fourth pathway to ASI: superintelligence as an emergent property of many coordinated AGI agents — group agen…
- Planning / Execution Division of Labor
Anthropic's 400K-session telemetry: in a typical Claude Code session humans make ~70% of planning decisions (what to do…
- Role Averaging, Not Role Elimination
Andrew Ambrosino's nuanced OpenAI-side take on role collapse: your role is 'the average of what you spend your time on'…
- Task Time-Horizon Scaling
METR's measure of the task length AI can complete reliably on its own, doubling roughly every 4 months (up from every 7…
Related articles
- Harness Shrinkage as Models Improve
Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…
- Claude Code
Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…
- Conversation-to-Delegation Shift
OpenAI's Codex usage study (June 2026): the move from conversational AI ('asking') to agentic AI ('delegated production…
- Compute Allocator
The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding…
- Implementation Abundance Inverts Product Work
Andrew Ambrosino's inversion thesis: when talking to a frontier model can stand up any feature from scratch, implementa…
