Sources#
Summary#
A line-delimited JSON-RPC-like protocol over stdio that lets external orchestrators drive a Codex coding-agent session programmatically. Documented at developers.openai.com/codex/app-server and exercised in detail by Symphony's SPEC.md. The orchestrator launches codex app-server (default command), exchanges a startup handshake, then streams turn events until the turn terminates. The same thread_id is reused across multiple turn/start requests for continuation, and dynamic tool calls allow the orchestrator to inject custom tools (e.g., linear_graphql) without exposing credentials to subagent containers.
Details#
Why a Protocol Instead of a CLI#
Symphony flagged the limitation directly: trying to drive Codex via CLI or tmux sessions doesn't scale to programmatic orchestration. The App Server is "a built-in headless mode for Codex" with a JSON-RPC API for things like starting a thread or reacting to turns. The orchestrator gets:
- Programmatic control over thread lifecycle without scraping terminal output.
- Hooks to inject custom tool implementations (dynamic tool calls).
- Structured events to drive observability, retry logic, and stall detection.
Launch Contract#
Subprocess launch parameters from Symphony's spec:
- Command:
codex.command(default:codex app-server) - Invocation:
bash -lc <command> - Working directory: per-issue workspace path (validated as a safety invariant —
cwd == workspace_pathbefore launch) - Stdout/stderr: separate streams. Only stdout is the protocol stream. Stderr is diagnostic-only — never parse it as JSON.
- Framing: line-delimited JSON, one message per line.
- Recommended max line size: 10 MB.
Startup Handshake#
Required message order, paraphrased from Symphony's illustrative transcript:
{"id":1,"method":"initialize","params":{"clientInfo":{"name":"symphony","version":"1.0"},"capabilities":{}}}
{"method":"initialized","params":{}}
{"id":2,"method":"thread/start","params":{"approvalPolicy":"...","sandbox":"...","cwd":"/abs/workspace"}}
{"id":3,"method":"turn/start","params":{"threadId":"<thread-id>","input":[{"type":"text","text":"<rendered prompt>"}],"cwd":"/abs/workspace","title":"ABC-123: Example","approvalPolicy":"...","sandboxPolicy":{"type":"..."}}}Notes:
initializerequest —clientInfoandcapabilities. If the targeted Codex version requires capability negotiation for dynamic tools, declare it here.initializednotification — sent afterinitializeresponse.thread/start— establishes thread context including approval policy, sandbox mode, and cwd. Optional client-side tool specs (e.g.,linear_graphql) advertised here.turn/start— first turn carries the rendered prompt; subsequent continuation turns send only continuation guidance.
Session identifier composition:
thread_idfromthread/startresult (result.thread.id)turn_idfrom eachturn/startresult (result.turn.id)- Emitted
session_id=<thread_id>-<turn_id>
Continuation Turns (Reuse the Thread)#
A non-obvious but important property: continuation turns reuse the same thread_id with new turn/start requests on the same live subprocess. The first turn sends the full rendered task prompt; later turns send only continuation guidance, since the original prompt is already in thread history.
Symphony's worker logic:
- After each successful turn, re-check tracker state.
- If issue still active, issue another
turn/starton samethreadId, up toagent.max_turns(default 20). - Subprocess remains alive across continuation turns — only stopped when the worker run ends.
This is the primitive that makes ticket-driven multi-turn work practical: the agent maintains conversational state across what looks externally like discrete continuations.
Streaming Turn Events#
Termination conditions for a single turn:
| Condition | Outcome |
|---|---|
turn/completed | success |
turn/failed | failure |
turn/cancelled | failure |
Turn timeout (turn_timeout_ms, default 1h) | failure |
| Subprocess exit | failure |
Important emitted events Symphony's spec enumerates:
session_started, startup_failed, turn_completed, turn_failed, turn_cancelled, turn_ended_with_error, turn_input_required, approval_auto_approved, unsupported_tool_call, notification, other_message, malformed.
Three Independent Timeouts#
| Timeout | Default | Scope |
|---|---|---|
read_timeout_ms | 5 s | Request/response during startup and sync requests |
turn_timeout_ms | 1 h | Total turn-stream duration |
stall_timeout_ms | 5 m | Inactivity between events (orchestrator-enforced) |
Set stall_timeout_ms <= 0 to disable stall detection.
Approvals, Sandboxes, User Input#
Implementation-defined posture, but the spec mandates that approval requests and user-input-required events must not leave a run stalled indefinitely. An implementation chooses one of:
- Auto-approve (high-trust mode, e.g., command exec and file-change approvals auto-approved for the session).
- Surface to operator (interactive flow).
- Auto-resolve by policy.
- Hard-fail the run (Symphony reference does this for user-input-required turns).
Treating user-input-required as a hard failure is appropriate for unattended orchestration — there's no human in the loop to satisfy the request.
Dynamic Tool Calls (the Most Underrated Feature)#
Experimental feature. The agent can request item/tool/call for tools the orchestrator advertises during thread/start. If the tool isn't recognized, return a tool-failure response — the session continues rather than stalling.
The leverage: orchestrator-implemented tools can wrap credentials the subagent should never see.
Symphony's linear_graphql tool is the canonical example:
- Subagent containers don't get the Linear access token.
- Instead, the orchestrator advertises a
linear_graphqltool that proxies authenticated GraphQL queries. - Subagent calls the tool with
{ "query": "...", "variables": {...} }; orchestrator executes against the Linear endpoint using its own auth.
Contract for linear_graphql:
- Single non-empty GraphQL operation per call (multiple operations rejected).
- Optional
variablesobject. - Result semantics:
- Transport success + no GraphQL
errors→success=true - Top-level GraphQL
errors→success=false, but preserve response body - Invalid input / missing auth / transport failure →
success=falsewith error payload
This is architecturally parallel to MCP but lives inside the coding-agent's runtime rather than a separate process — orchestrator decides what tools to inject per session.
Compatibility Profile#
Symphony's spec is unusually explicit about version tolerance:
"The normative contract is message ordering, required behaviors, and the logical fields that must be extracted. Exact JSON field names may vary slightly across compatible app-server versions. Implementations should tolerate equivalent payload shapes when they carry the same logical meaning."
Practical implication: don't bind tightly to specific JSON field names. Implementors are encouraged to inspect the installed Codex schema via codex app-server generate-json-schema --out <dir> and treat Codex-owned config (approval_policy, thread_sandbox, turn_sandbox_policy) as pass-through values.
Recommended Error Categories#
For normalization across implementations:
codex_not_found, invalid_workspace_cwd, response_timeout, turn_timeout, port_exit, response_error, turn_failed, turn_cancelled, turn_input_required.
Token Accounting Subtlety#
Worth flagging because it's easy to get wrong. Agent events may include token counts in multiple shapes:
- Prefer absolute thread totals (e.g.,
thread/tokenUsage/updatedortotal_token_usagewithin token-count wrappers). - Ignore delta-style payloads (e.g.,
last_token_usage) for dashboards, or you'll double-count. - Don't treat generic
usagemaps as cumulative unless the event type defines them that way. - For absolute totals, track deltas relative to last reported totals.
Connections#
- Symphony — the canonical orchestrator built on this protocol; entity page with the lifecycle and operational details
- Ticket-Driven Agent Orchestration — continuation turns are what make multi-turn work-per-ticket practical; one ticket → one thread → many turns
- Agent Harness Engineering — the App Server protocol is the integration boundary that makes "harness as service" possible — orchestrator drives session lifecycle without scraping a CLI
- Claude Code Best Practices — Claude's
claude -pnon-interactive mode plus the Claude Agent SDK are the parallel ecosystem; both let an external orchestrator drive sessions, but Codex's App Server is more explicit about a stable JSON-RPC protocol - Client-Side Agent Optimization —
agent.max_turns,turn_timeout_ms,stall_timeout_ms, and the dynamic-tool-call cost (proxy round-trip vs. direct call) are operational instances of the budget lever AgentOpt formalizes
Open Questions#
- How does the App Server protocol compare in detail to MCP? Both expose tools to a model, but App Server is inside the Codex runtime while MCP is outside. When does each win?
- Is there a public schema registry so external orchestrators can target specific App Server versions without
generate-json-schema? - The "dynamic tool calls (experimental)" caveat — what's the stability roadmap? Symphony depends on this for its security model.
- How well does the protocol handle multi-modal turns (image inputs, screenshot attachments)? The spec is text-focused.
- Is there an analogous protocol on the Claude side, or is Claude's equivalent exclusively the Agent SDK + tool-use API? Comparing the two would clarify when "drive an existing CLI" beats "build on the SDK."
Sources#
- An open-source spec for Codex orchestration: Symphony. — Section 10 (Agent Runner Protocol) and Section 13.5 (Token Accounting) are the authoritative reference
4 articles link here
- ConceptAgent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
- ConceptClaude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
- EntitySymphony
OpenAI's open-source agent orchestrator (March 2026): turns Linear into a control plane for Codex, per-issue workspace,…
- ConceptTicket-Driven Agent Orchestration
The inversion that makes Symphony work: tickets as units of work (not sessions/PRs), DAG dependencies, agent-extensible…
Related articles
- ConceptLLM-as-Compiler Knowledge Base
Karpathy's architecture: LLM incrementally compiles raw docs into a persistent interlinked wiki, replacing RAG with a 4…
- EntityHermes Agent
Nous Research's CLI agent + Gateway daemon (Telegram/Discord/Slack/WhatsApp); AGENTS.md/SOUL.md context split, bounded…
- ConceptAgent Harness Engineering
Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…
- ConceptClaude Code Best Practices
Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…
- ConceptClient-Side Agent Optimization
AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server…
