Parallel Agent Orchestration

Sources#

The Shift to Agentic AI: Evidence from Codex

Summary#

Two of the three "how" margins in OpenAI's Codex usage study — concurrency (running multiple agents at once) and runtime (agents working long blocks on your behalf) — which together describe the workflow at the frontier of agentic use: a human who oversees a team of agents, delegating tasks across many simultaneous workers, rather than directly performing the work. Codex's threaded interaction model makes this possible — each agent runs in a largely independent workspace, so a user need not wait for one task to finish before starting another. This page supplies the first hard adoption numbers for the role shift that Founder as Agent Orchestrator, Loop Engineering, and Managers as ICs describe qualitatively: among intensive users, Codex is "less an assistant answering requests and more a workflow system in which the user delegates, monitors, reviews, and coordinates multiple streams of work."

Evidence note. empirical — concurrency measured as overlapping turns in different threads (>30s overlap) in the week before June 11, 2026; runtime measured as summed active turn-latency per day (gaps >30min removed as awaiting-input). OpenAI-internal is a frontier preview, not a population estimate. Cumulative daily runtime can exceed 24h because overlapping turns are summed.

Concurrency: the OpenAI / external split is stark#

Peak concurrent agents in the measured week, by population:

Population	Zero concurrent turns	5+ concurrent agents
Organizational users	67.4%	small tail
Individual users	63.9%	small tail
OpenAI workers	10.7% (use a single workflow)	28.6%

Among external users, concurrency is "fairly minimal" — roughly two-thirds never overlap turns, and those who do mostly peak at two. Among OpenAI workers it inverts: only 10.7% run a sole workflow at any point, and 28.6% managed five or more concurrent agents. The paper calls this "fundamentally different" from external practice: it requires the human to manage, delegate to, and review the work of a relatively large group of agents — a supervisory workflow, not a hands-on one.

Runtime: long-running work concentrated at the top tail#

The duration margin shows the same median-vs-frontier gap:

Median OpenAI employee: ~2.5 agent-hours/day (June 11, 2026). Meaningful delegated blocks, but not continuous around-the-clock execution — typical use is still intermittent.
p99 OpenAI employee: ~71 agent-hours/day — which implies several agents running concurrently at any given hour. Up ~88% since April 7, 2026.
External tails grew too: p99 daily runtime rose ~25% (organizational) and ~50% (individual) over the sample, but absolute levels stay far below OpenAI.

The pattern across both margins: agentic workflows remain sporadic for the typical user, but a smaller group of high-intensity users is rapidly expanding the work it delegates — and that group is overwhelmingly inside OpenAI, the frontier preview.

Why software, why now, and the inversion of the human's role#

The paper grounds parallelism in the same property that makes coding the leading edge of agentic AI: software work is digital, verifiable, and modular into many subtasks — exactly the shape that lets one person fan work across many independent agents and review the results. The consequence is a role inversion: the human stops being the executor and becomes the delegator-monitor-reviewer-coordinator of a portfolio of agentic work. This is precisely the review-and-supervision bottleneck made visible in behavior — the more agents you run in parallel, the more your throughput is gated by your capacity to review, not the model's capacity to produce (Loop Engineering's "your review bandwidth decides how many you can actually run"; AI Brain Fry's oversight-fatigue ceiling).

Connections#

Role Averaging, Not Role Elimination — "an IC manages agents" made literal: the fleet the averaged role runs
Conversation-to-Delegation Shift — concurrency and runtime are two of the three "how" margins (with systematization) that study uses to measure depth of delegation
Agentic Work Systematization — the sibling margin; reusable skills are what make parallel/repeatable delegation tractable enough to run many at once
Founder as Agent Orchestrator — the qualitative role this page quantifies: founder/worker as orchestrator of many specialized agents; here are the first concurrency/runtime adoption numbers
Managers as ICs — running a fleet of agents is the IC-becomes-manager shift in literal form: the intensive user manages, delegates, and reviews a team of agent-workers
Verification as the New Bottleneck — parallel fan-out is gated by the human's review capacity; concurrency makes the supervision bottleneck the binding constraint
Loop Engineering — worktrees + sub-agents are the primitives that enable safe parallelism; "review bandwidth, not the tool, decides how many you can run" is this page's ceiling
Multi-Agent Collective Intelligence — the architecture side (agents coordinating) vs this page's usage side (one human coordinating many agents)
AI Brain Fry — the cognitive cost of overseeing many parallel streams; the oversight-fatigue limit on how far concurrency can scale per human
Planning / Execution Division of Labor — concurrency is the human keeping the planning/coordination role while execution fans out across agents
Engineer PM Convergence — the parallel-orchestration workflow is the IC-toward-manager/PM convergence shown in usage data
Task Time-Horizon Scaling — long-running single agents (the runtime margin) sit under METR's rising reliable-task-length ceiling
OpenAI — the lab whose internal usage is the frontier preview of high-concurrency workflows
Codex — the threaded-interaction tool whose concurrency this measures

Open questions#

p99 OpenAI runtime of 71 agent-hours/day is a frontier preview inside an unusually favorable environment. Does external concurrency actually trend toward it as frictions fall, or is heavy parallelism specific to model-adjacent work?
Summed-overlap runtime can exceed 24h/day — it measures agent effort, not human attention. What is the human's actual oversight load per concurrent agent, and where does it saturate (AI Brain Fry)?
Concurrency is measured over one week. Is 5+-agent management a stable practice or a burst around specific large tasks?

Sources#

The Shift to Agentic AI: Evidence from Codex — §5.1 "Turn Concurrency"; §5.2 "Long-running agents"; §6 Conclusion