Howardism

Sources#

An open-source spec for Codex orchestration: Symphony.

Summary#

Symphony is an open-source agent-orchestration spec from OpenAI's Codex team (Alex Kotliarskyi, Victor Zhu, Zach Brock — same team that authored the harness engineering post). It turns an issue tracker (Linear in v1) into a control plane for coding agents: every open ticket gets a dedicated workspace and a continuously running Codex App Server session. The "product" is primarily a SPEC.md file in the openai/symphony repo — OpenAI explicitly does not plan to maintain Symphony as a standalone tool, intending it as a reference implementation that users point their own coding agent at.

Details#

Origin Story#

Built six months before the public March 2026 announcement, Symphony emerged from a different bottleneck than the one Agent Harness Engineering addressed. Once the OpenAI productivity-tool team had a working agent-friendly repo (no human-written code, ~1M lines, ~1,500 PRs), the new bottleneck was human attention — an engineer could comfortably manage 3–5 concurrent Codex sessions before context-switching collapsed productivity.

Evolution path:

v1 — A Codex session in tmux polling Linear and spawning sub-agents for new tasks. Worked, not reliable.
v2 — Lived inside their main project repo, leveraging the existing harness.
v3 — "Used Symphony to build Symphony." Once core functionality existed, the system bootstrapped its own development.
External release — Extracted to a standalone SPEC.md. OpenAI asked Codex to implement the spec in Elixir, then in TypeScript, Go, Rust, Java, and Python — divergences across implementations were used as a spec-fuzzing signal to remove ambiguities. (See LLM-as-Compiler Knowledge Base for why this matters.)

The reference implementation is in Elixir, chosen for concurrency and supervision primitives. As the post puts it: "when code is effectively free, you can finally pick languages for their strengths."

What It Actually Does#

Symphony is a long-running daemon that:

Polls Linear (default 30s) for issues in active states (Todo, In Progress).
For each eligible issue, creates a deterministic per-issue workspace at <workspace.root>/<sanitized_identifier>.
Launches a Codex App Server session in that workspace, rendering a per-team prompt template from WORKFLOW.md.
Reconciles state every tick — kills sessions whose tickets transitioned to terminal states, retries crashed/stalled sessions with exponential backoff.
Does not write to the tracker itself. State transitions, comments, PR links are performed by the coding agent using its own tools. Symphony is a "scheduler/runner and tracker reader."

The SPEC.md is the Product#

When you open the repo, the first thing you see is SPEC.md — not source code. The spec defines:

Workflow contract: WORKFLOW.md is a markdown file with YAML front matter, version-controlled in the user's repo, parsed for runtime config (tracker, polling, workspace, hooks, agent, codex) and a Liquid-compatible prompt template body.
State machine: 5 orchestration states (Unclaimed, Claimed, Running, RetryQueued, Released) and 11 run-attempt phases. Tracker states are separate from orchestrator states.
Concurrency control: global cap (max_concurrent_agents, default 10), per-state cap, optional per-SSH-host cap.
Workspace safety invariants: agent only runs in per-issue workspace; workspace path must stay inside workspace root; identifier sanitized to [A-Za-z0-9._-].
No durable orchestrator DB: restart recovery is tracker-driven and filesystem-driven. Stale terminal workspaces cleaned at startup.
Workspaces preserved across runs (intentional). Counter to typical CI ephemerality — warm cache benefit, state-pollution risk.

This is a deliberate inversion: rather than building a complex supervision system, OpenAI defined the problem and let coding agents implement it.

The Linear-as-Control-Plane Insight#

The deeper shift Symphony forces is captured in Ticket-Driven Agent Orchestration: tickets become the unit of work, not sessions or PRs. Among some teams at OpenAI, landed PRs increased 500% in the first three weeks (hedged claim — "on some teams," no baseline definition). Linear founder Karri Saarinen separately reported a spike in workspace creation correlating with Symphony's release.

Outside-OpenAI evidence: 15K+ GitHub stars by April 23, 2026 within ~6 weeks of release.

"Objectives, Not Transitions" — A Lesson Re-Learned#

OpenAI's first version of Symphony treated agents as rigid nodes in a state machine — Codex was only asked to implement the task. They found this too limiting:

"Models get smarter and can solve bigger problems than the box we try to fit them in."

The shift: give Codex tools (gh CLI, CI-log reading skills, etc.) and objectives, not state transitions. This restates the "enforce invariants, not implementations" principle for orchestration — same idea, different layer.

Codex App Server and Token-Isolated Tool Injection#

Symphony uses Codex's headless mode (the App Server) rather than driving the CLI. The full protocol is documented in Codex App Server Protocol. The notable use of dynamic tool calls (experimental): instead of giving sub-agents the Linear access token directly, Symphony exposes a linear_graphql tool that proxies authenticated requests using the orchestrator's auth — token never reaches the subagent container.

This is parallel in spirit to MCP but for the coding-agent runtime specifically.

What Symphony Doesn't Solve#

Honest tradeoffs called out in the post:

Lost mid-flight steering: when work is assigned at the ticket level, you can no longer nudge the agent during execution. Failures reveal gaps in the harness/skills, which are then patched system-wide.
Not all tasks fit: ambiguous problems requiring strong human judgment still need interactive Codex sessions. Symphony handles bulk routine implementation.
State machine rigidity (early lesson, fixed): see "Objectives, Not Transitions" above.

Relationship to Other Things#

Symphony vs. Claude Code agents: parallel ecosystems. Symphony is daemon-first (always-on, polling); Claude Code is session-first with optional non-interactive mode (claude -p). Both rely on repo-versioned markdown for behavior config (see Claude Code Best Practices CLAUDE.md, Hermes Agent AGENTS.md/SOUL.md).
Symphony vs. Hermes Gateway: both are multi-tenant daemons running as systemd/launchd services. Symphony's tenancy unit is the issue; Hermes's tenancy unit is the user (see Hermes Agent). Both isolate per-tenant and prefer Docker for safety.

Connections#

Ticket-Driven Agent Orchestration — the core abstraction Symphony codifies; transferable beyond Linear/Codex
Codex App Server Protocol — the runtime protocol Symphony depends on; covers the JSON-RPC stdio handshake, continuation turns, and dynamic tool calls in detail
Agent Harness Engineering — Symphony is the natural "harness as service" evolution of the same OpenAI team's earlier work; "objectives not transitions" is the same idea as "enforce invariants, not implementations"
LLM-as-Compiler Knowledge Base — the "compile the spec in 6 languages, use divergences to find ambiguities" technique is the most concrete extension yet of LLM-as-compiler — it uses cross-implementation drift as a spec-fuzzer
Claude Code Best Practices — Symphony's WORKFLOW.md is the same pattern as CLAUDE.md: repo-versioned plaintext as agent control plane, but at the orchestration layer rather than the session layer
Hermes Agent — parallel "always-on agent daemon" architecture, with per-user instead of per-issue isolation
Client-Side Agent Optimization — Symphony's per-state concurrency caps and continuation-turn budgets are operational instances of the budget lever AgentOpt formalizes
Claude's Constitution / Model Spec — same spec-as-document pattern at a different layer: Symphony's SPEC.md is product-level; the Model Spec / Constitution is alignment-level, but both treat plaintext spec as load-bearing artifact
Model Spec Midtraining (MSM) — extends spec-as-lever further: the alignment spec is now a direct training input, not just a runtime guidance doc
Model Spec Science — the spec-fuzzing technique (compile spec in 6 languages, use divergences to surface ambiguity) is a methodological cousin of empirical Model Spec study

Open Questions#

The 500% landed-PRs claim is hedged — no baseline definition, "on some teams" only. What does the distribution look like across teams? What happens to PR quality and revert rate at that throughput?
"Workspaces preserved across runs" is the opposite of typical CI ephemerality. At what point does state pollution from prior runs (stale node_modules, leftover branches, build artifacts) start hurting more than warm-cache helps?
Symphony doesn't write to the tracker — agents do. This means tracker policy is a prompt in WORKFLOW.md. How brittle is this in practice when Linear changes its API? How is consistent state-machine behavior enforced when agents have prompt-level discretion?
The spec was simplified by being implemented in 6 languages. What's the extension of this technique? Could compiler-prompt.md in this vault be similarly cross-fuzzed?
Symphony explicitly says agents can self-create tickets. What governance prevents runaway ticket-graph expansion? Is human triage of agent-created tickets the only check?

Sources#

An open-source spec for Codex orchestration: Symphony.