Sources#
Summary#
Symphony is an open-source agent-orchestration spec from OpenAI's Codex team (Alex Kotliarskyi, Victor Zhu, Zach Brock — same team that authored the harness engineering post). It turns an issue tracker (Linear in v1) into a control plane for coding agents: every open ticket gets a dedicated workspace and a continuously running Codex App Server session. The "product" is primarily a SPEC.md file in the openai/symphony repo — OpenAI explicitly does not plan to maintain Symphony as a standalone tool, intending it as a reference implementation that users point their own coding agent at.
Details#
Origin Story#
Built six months before the public March 2026 announcement, Symphony emerged from a different bottleneck than the one Agent Harness Engineering addressed. Once the OpenAI productivity-tool team had a working agent-friendly repo (no human-written code, ~1M lines, ~1,500 PRs), the new bottleneck was human attention — an engineer could comfortably manage 3–5 concurrent Codex sessions before context-switching collapsed productivity.
Evolution path:
- v1 — A Codex session in
tmuxpolling Linear and spawning sub-agents for new tasks. Worked, not reliable. - v2 — Lived inside their main project repo, leveraging the existing harness.
- v3 — "Used Symphony to build Symphony." Once core functionality existed, the system bootstrapped its own development.
- External release — Extracted to a standalone
SPEC.md. OpenAI asked Codex to implement the spec in Elixir, then in TypeScript, Go, Rust, Java, and Python — divergences across implementations were used as a spec-fuzzing signal to remove ambiguities. (See LLM-as-Compiler Knowledge Base for why this matters.)
The reference implementation is in Elixir, chosen for concurrency and supervision primitives. As the post puts it: "when code is effectively free, you can finally pick languages for their strengths."
What It Actually Does#
Symphony is a long-running daemon that:
- Polls Linear (default 30s) for issues in active states (
Todo,In Progress). - For each eligible issue, creates a deterministic per-issue workspace at
<workspace.root>/<sanitized_identifier>. - Launches a Codex App Server session in that workspace, rendering a per-team prompt template from
WORKFLOW.md. - Reconciles state every tick — kills sessions whose tickets transitioned to terminal states, retries crashed/stalled sessions with exponential backoff.
- Does not write to the tracker itself. State transitions, comments, PR links are performed by the coding agent using its own tools. Symphony is a "scheduler/runner and tracker reader."
The SPEC.md is the Product#
When you open the repo, the first thing you see is SPEC.md — not source code. The spec defines:
- Workflow contract:
WORKFLOW.mdis a markdown file with YAML front matter, version-controlled in the user's repo, parsed for runtime config (tracker,polling,workspace,hooks,agent,codex) and a Liquid-compatible prompt template body. - State machine: 5 orchestration states (
Unclaimed,Claimed,Running,RetryQueued,Released) and 11 run-attempt phases. Tracker states are separate from orchestrator states. - Concurrency control: global cap (
max_concurrent_agents, default 10), per-state cap, optional per-SSH-host cap. - Workspace safety invariants: agent only runs in per-issue workspace; workspace path must stay inside workspace root; identifier sanitized to
[A-Za-z0-9._-]. - No durable orchestrator DB: restart recovery is tracker-driven and filesystem-driven. Stale terminal workspaces cleaned at startup.
- Workspaces preserved across runs (intentional). Counter to typical CI ephemerality — warm cache benefit, state-pollution risk.
This is a deliberate inversion: rather than building a complex supervision system, OpenAI defined the problem and let coding agents implement it.
The Linear-as-Control-Plane Insight#
The deeper shift Symphony forces is captured in Ticket-Driven Agent Orchestration: tickets become the unit of work, not sessions or PRs. Among some teams at OpenAI, landed PRs increased 500% in the first three weeks (hedged claim — "on some teams," no baseline definition). Linear founder Karri Saarinen separately reported a spike in workspace creation correlating with Symphony's release.
Outside-OpenAI evidence: 15K+ GitHub stars by April 23, 2026 within ~6 weeks of release.
"Objectives, Not Transitions" — A Lesson Re-Learned#
OpenAI's first version of Symphony treated agents as rigid nodes in a state machine — Codex was only asked to implement the task. They found this too limiting:
"Models get smarter and can solve bigger problems than the box we try to fit them in."
The shift: give Codex tools (gh CLI, CI-log reading skills, etc.) and objectives, not state transitions. This restates the "enforce invariants, not implementations" principle for orchestration — same idea, different layer.
Codex App Server and Token-Isolated Tool Injection#
Symphony uses Codex's headless mode (the App Server) rather than driving the CLI. The full protocol is documented in Codex App Server Protocol. The notable use of dynamic tool calls (experimental): instead of giving sub-agents the Linear access token directly, Symphony exposes a linear_graphql tool that proxies authenticated requests using the orchestrator's auth — token never reaches the subagent container.
This is parallel in spirit to MCP but for the coding-agent runtime specifically.
What Symphony Doesn't Solve#
Honest tradeoffs called out in the post:
- Lost mid-flight steering: when work is assigned at the ticket level, you can no longer nudge the agent during execution. Failures reveal gaps in the harness/skills, which are then patched system-wide.
- Not all tasks fit: ambiguous problems requiring strong human judgment still need interactive Codex sessions. Symphony handles bulk routine implementation.
- State machine rigidity (early lesson, fixed): see "Objectives, Not Transitions" above.
Relationship to Other Things#
- Symphony vs. Claude Code agents: parallel ecosystems. Symphony is daemon-first (always-on, polling); Claude Code is session-first with optional non-interactive mode (
claude -p). Both rely on repo-versioned markdown for behavior config (see Claude Code Best Practices CLAUDE.md, Hermes Agent AGENTS.md/SOUL.md). - Symphony vs. Hermes Gateway: both are multi-tenant daemons running as systemd/launchd services. Symphony's tenancy unit is the issue; Hermes's tenancy unit is the user (see Hermes Agent). Both isolate per-tenant and prefer Docker for safety.
Connections#
- Ticket-Driven Agent Orchestration — the core abstraction Symphony codifies; transferable beyond Linear/Codex
- Codex App Server Protocol — the runtime protocol Symphony depends on; covers the JSON-RPC stdio handshake, continuation turns, and dynamic tool calls in detail
- Agent Harness Engineering — Symphony is the natural "harness as service" evolution of the same OpenAI team's earlier work; "objectives not transitions" is the same idea as "enforce invariants, not implementations"
- LLM-as-Compiler Knowledge Base — the "compile the spec in 6 languages, use divergences to find ambiguities" technique is the most concrete extension yet of LLM-as-compiler — it uses cross-implementation drift as a spec-fuzzer
- Claude Code Best Practices — Symphony's
WORKFLOW.mdis the same pattern asCLAUDE.md: repo-versioned plaintext as agent control plane, but at the orchestration layer rather than the session layer - Hermes Agent — parallel "always-on agent daemon" architecture, with per-user instead of per-issue isolation
- Client-Side Agent Optimization — Symphony's per-state concurrency caps and continuation-turn budgets are operational instances of the budget lever AgentOpt formalizes
- Claude's Constitution / Model Spec — same spec-as-document pattern at a different layer: Symphony's
SPEC.mdis product-level; the Model Spec / Constitution is alignment-level, but both treat plaintext spec as load-bearing artifact - Model Spec Midtraining (MSM) — extends spec-as-lever further: the alignment spec is now a direct training input, not just a runtime guidance doc
- Model Spec Science — the spec-fuzzing technique (compile spec in 6 languages, use divergences to surface ambiguity) is a methodological cousin of empirical Model Spec study
Open Questions#
- The 500% landed-PRs claim is hedged — no baseline definition, "on some teams" only. What does the distribution look like across teams? What happens to PR quality and revert rate at that throughput?
- "Workspaces preserved across runs" is the opposite of typical CI ephemerality. At what point does state pollution from prior runs (stale
node_modules, leftover branches, build artifacts) start hurting more than warm-cache helps? - Symphony doesn't write to the tracker — agents do. This means tracker policy is a prompt in
WORKFLOW.md. How brittle is this in practice when Linear changes its API? How is consistent state-machine behavior enforced when agents have prompt-level discretion? - The spec was simplified by being implemented in 6 languages. What's the extension of this technique? Could
compiler-prompt.mdin this vault be similarly cross-fuzzed? - Symphony explicitly says agents can self-create tickets. What governance prevents runaway ticket-graph expansion? Is human triage of agent-created tickets the only check?
Sources#
13 articles link here
- ConceptAgent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
- ConceptAgent Loop Pattern
`/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, p…
- ConceptAgentic Misalignment (AM)
Lynch et al. 2025 eval and threat model: LLM email-agent discovers it may be deleted, can take harmful actions; OOD rel…
- ConceptClaude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
- EntityClaude's Constitution / Model Spec
Anthropic Model Spec / Constitution by Askell et al.; document specifying Claude's values + hard constraints (SP1–3, GP…
- ConceptClient-Side Agent Optimization
AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server…
- ConceptCodex App Server Protocol
JSON-RPC stdio protocol for headless Codex sessions: initialize/initialized/thread-start/turn-start handshake, continua…
- EntityHermes Agent
Nous Research's CLI agent + Gateway daemon (Telegram/Discord/Slack/WhatsApp); AGENTS.md/SOUL.md context split, bounded…
- ConceptLLM-as-Compiler Knowledge Base
Karpathy's architecture: LLM incrementally compiles raw docs into a persistent interlinked wiki, replacing RAG with a 4…
- ConceptModel Spec Midtraining (MSM)
New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT…
- ConceptModel Spec Science
Empirical study of which Model Spec features best generalize alignment; value explanations > rules alone, specific > ge…
- EntityThinking Machines Lab
AI research lab behind interaction models (May 2026); harness-dissolves-into-model thesis; upstreamed streaming-session…
- ConceptTicket-Driven Agent Orchestration
The inversion that makes Symphony work: tickets as units of work (not sessions/PRs), DAG dependencies, agent-extensible…
Related articles
- ConceptAgent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
- ConceptClaude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
- ConceptTicket-Driven Agent Orchestration
The inversion that makes Symphony work: tickets as units of work (not sessions/PRs), DAG dependencies, agent-extensible…
- ConceptClient-Side Agent Optimization
AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server…
- EntityHermes Agent
Nous Research's CLI agent + Gateway daemon (Telegram/Discord/Slack/WhatsApp); AGENTS.md/SOUL.md context split, bounded…
