Skip to content
H
Howardismvol. 03 · quiet corner of the web
PLATE II · PIECE № 46HOWARDISM

Hermes Agent

PublishedApril 28, 2026FiledEntityReading9 minSourceAI-synthesised

Nous Research's CLI agent + Gateway daemon (Telegram/Discord/Slack/WhatsApp); AGENTS.md/SOUL.md context split, bounded memory files, DM-pairing auth, container-as-security-boundary model

Illustration for Hermes Agent

Sources#

Summary#

Hermes Agent is an open-source CLI coding/research agent from Nous Research, positioned as a parallel ecosystem to Claude Code with a stronger emphasis on multi-platform messaging deployment. The CLI provides an interactive REPL with tool access (terminal, file editing, web search, code execution); the Hermes Gateway is a long-running daemon that exposes the same agent through Telegram, Discord, Slack, and WhatsApp with per-user sessions, allowlist + DM-pairing authorization, scheduled cron jobs, and configurable container backends. Hermes is provider-agnostic (works with OpenAI, Anthropic, and other LLM APIs) and uses a context-file convention closer to OpenAI's AGENTS.md than Anthropic's CLAUDE.md, plus a separate SOUL.md for personality.

Details#

Architecture (Two Surfaces)#

SurfacePurposeLifecycle
Hermes CLIInteractive REPL, similar role to Claude CodePer-invocation, resumable (hermes -c, hermes -r "title")
Hermes GatewayLong-running daemon exposing the agent over messaging platformssystemd (Linux user/system service) or launchd (macOS); persists across reboots

The Gateway is the architecturally interesting half — it's parallel in spirit to Symphony's always-on daemon model, but with per-user rather than per-issue tenancy.

Context Files#

Hermes uses three layered configuration files, with explicit role separation:

FileScopePurpose
AGENTS.mdProject (cwd)Project context, conventions, stack — auto-loaded each session
~/.hermes/SOUL.md (or $HERMES_HOME/SOUL.md)Global personalityStable default voice/style across all sessions
.cursorrules / .cursor/rules/*.mdcProjectCompatibility-loaded from cwd; no need to duplicate

Subdirectory AGENTS.md files are lazily discovered during tool calls (via subdirectory_hints.py) and injected into tool results — they aren't loaded upfront. This is a deliberate context-budget choice: only the top-level project context goes into the system prompt; nested context is paid for only when relevant.

The separation of AGENTS.md (project) and SOUL.md (personality) is sharper than Anthropic's CLAUDE.md convention and worth comparing — the same role split is implicit in CLAUDE.md files in practice but not separately filed. (Will be expanded in a forthcoming agent-context-files concept page.)

Memory System#

Hermes implements a hard-bounded memory system:

  • MEMORY.md: ~2,200 character cap.
  • USER.md: ~1,375 character cap.
  • When memory fills, the agent consolidates entries (compresses older notes).

A specific gotcha called out in the docs:

"Memory is a frozen snapshot — changes made during a session don't appear in the system prompt until the next session starts. The agent writes to disk immediately, but the prompt cache isn't invalidated mid-session."

This is the cleanest articulation of why memory edits feel "delayed" in any caching agent — and it's a constraint Claude Code and other agents share but rarely document.

The Memory vs. Skills split:

  • Memory = facts (environment, preferences, project locations).
  • Skills = procedures (multi-step workflows, reusable recipes).
  • "Memory for what, skills for how."

Token-Economy Tooling#

Direct user-facing controls — operationally these are exactly the levers AgentOpt formalizes, but exposed as CLI commands:

CommandEffect
/compressSummarize conversation history; preserves key context, drops tokens
/usageToken consumption status
/insights30-day usage patterns
/modelSwitch model mid-session (frontier for hard reasoning, fast for boilerplate)
delegate_taskSpawn parallel subagents with isolated contexts; only summaries return

The cache-discipline note is sharp:

"Most LLM providers cache the system prompt prefix. If you keep your system prompt stable (same context files, same memory), subsequent messages in a session get cache hits that are significantly cheaper. Avoid changing the model or system prompt mid-session."

CLI Ergonomics#

InputBehavior
Alt+Enter / Ctrl+JMulti-line input without sending
Multi-line pasteAuto-detected, buffered as one message
Ctrl+C (once)Interrupt mid-response, redirect with new message
Ctrl+C (twice within 2s)Force exit
Ctrl+VPaste image from clipboard (vision)
/ + TabSlash command autocomplete
/verboseCycle tool-output modes: off → new → all → verbose

Hermes Gateway: Multi-User Daemon Pattern#

The Gateway exposes the agent through messaging platforms:

  • Per-user sessions — each authorized user gets their own conversation context.
  • Home channel (/sethome) — designated chat that receives cron output and proactive messages.
  • Two authorization models:
  1. Static allowlist: TELEGRAM_ALLOWED_USERS=123,456 in .env. Restart required to add users.
  2. DM pairing: unauthorized DMs receive a one-time code; admin runs hermes pairing approve telegram XKGH5N7P. No restart needed.
  • Pairing security: codes expire 1 hour, cryptographic randomness, rate-limited (1 request/user/10min, max 3 pending/platform), 5 failed approvals → 1-hour platform lockout, all data stored at chmod 0600.

Service installation:

hermes gateway install # Default: user-level systemd (Linux) / launchd (macOS)
sudo hermes gateway install --system # Linux: boot-time system service
sudo loginctl enable-linger $USER # Linux: keep running after SSH logout

Cron Jobs#

Scheduled work delivered to the home channel:

  • Created from chat: "Every weekday at 9am, check the GitHub repo for…"
  • Definitions stored at ~/.hermes/cron/jobs.json; output at ~/.hermes/cron/output/{job_id}/{timestamp}.md.
  • Critical caveat: cron prompts run in completely fresh sessions with no memory. Each prompt must contain all needed context — file paths, URLs, server addresses, instructions.

This parallels Symphony's daemon-driven dispatch model — both are cases of agents being pulled to work asynchronously on a schedule rather than driven interactively.

Container Safety Model (Worth Flagging)#

Hermes supports multiple terminal backends:

TERMINAL_BACKEND=docker
TERMINAL_DOCKER_IMAGE=hermes-sandbox:latest

Supported: Docker, Singularity, Modal, Daytona.

Important security shift: when running in a container backend, dangerous-command checks are skipped — the rationale being "the container is the security boundary." This means:

  • Locked-down container image quality becomes load-bearing.
  • Approval-prompt UX trades for image-discipline UX.
  • Trust shifts from "every command is reviewed" to "the container can't escape."

Worth noting because it's a meaningfully different security posture from Claude Code's per-command auto-mode classifier.

Comparison to Claude Code#

CapabilityClaude CodeHermes
Project contextCLAUDE.mdAGENTS.md (project) + SOUL.md (personality, separate)
Cursor compatn/a.cursorrules / .cursor/rules/*.mdc auto-loaded
Session compaction/compact/compress
Mid-session model switch/model/model
Parallel subagentsSubagents in .claude/agents/delegate_task tool
Permission gatingauto mode classifierPer-pattern approvals (once/session/always/deny); skipped in containers
Memory modeln/a (relies on conversation + CLAUDE.md)Bounded MEMORY.md + USER.md with auto-consolidation
Multi-user deploymentclaude -p per user/sessionHermes Gateway with allowlist or DM pairing
Cron / scheduled workn/a (rely on external cron)Built-in cron with home-channel delivery

The most significant architectural difference is the Gateway: Claude Code is session-first; Hermes is daemon-first when deployed at team scale.

Operational Notes#

  • VPS sizing: $5/month is sufficient for the Gateway itself — LLM API calls are the cost driver.
  • macOS launchd PATH gotcha: the plist captures shell PATH at install time. After installing new tools (Node, ffmpeg), re-run hermes gateway install to refresh.
  • Session auto-reset: messaging sessions reset after idle (default 24h) or daily at 4am.
  • Self-update: /update from chat pulls latest version and restarts.

Connections#

  • Claude Code Best Practices — Hermes is the closest parallel ecosystem; many concepts map directly (/compress/compact, delegate_task ↔ subagents, AGENTS.mdCLAUDE.md); the differences highlight design choices each made
  • Symphony — both are always-on agent daemons; Symphony tenancy unit is the issue, Hermes tenancy unit is the user. Both prefer container backends for safety. Cron + home-channel parallels Symphony's polling + tracker-write
  • Client-Side Agent Optimization — Hermes's /model, /compress, delegate_task, and prompt-cache discipline are the user-facing surface of exactly the levers AgentOpt formalizes
  • Agent Harness EngineeringAGENTS.md follows OpenAI's "table of contents, not encyclopedia" principle; bounded memory files are a harness-level constraint analogous to JSON feature lists
  • Scale-Dependent Prompt Sensitivity/verbose modes and bounded memory both implicitly limit output length; the brevity-constraint findings would predict gains
  • Claude Code Auto Mode — Hermes's per-pattern approval model (once/session/always/deny) and container-disables-approvals model are points of comparison with the classifier-based auto mode
  • Agentic Misalignment (AM) — Hermes daemon mode (long-context, tool-using, container-as-security-boundary, weak per-action human oversight) sits squarely in the AM threat surface; container isolation reduces blast radius but doesn't address model-side misalignment

Open Questions#

  • The container backend disabling dangerous-command checks is a defensible design but a meaningful security-model shift. What's the empirical track record? Have lockdown failures in popular images (Daytona, nikolaik/python-nodejs) caused incidents?
  • How do bounded memory files (~2,200 chars MEMORY.md) hold up over long-term use? Auto-consolidation is mentioned but not specified — what's the consolidation algorithm and how lossy is it?
  • Hermes's DM-pairing flow is a clean security primitive. Why hasn't this pattern been adopted by Claude Code or Cursor for shared/team deployments?
  • The split between AGENTS.md (project) and SOUL.md (personality) is explicit in Hermes but implicit in Claude Code's CLAUDE.md. Does the split materially improve outcomes, or is it a documentation choice without empirical backing?
  • Cron jobs in fresh sessions with no memory — how do teams structure the "context the agent needs" without it bloating every cron prompt? Is there a standard pattern?

Sources#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

8 articles link here
  • ConceptAgent Harness Engineering

    Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…

  • ConceptAgentic Misalignment (AM)

    Lynch et al. 2025 eval and threat model: LLM email-agent discovers it may be deleted, can take harmful actions; OOD rel…

  • ConceptClaude Code Auto Mode

    Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground betw…

  • ConceptClaude Code Best Practices

    Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…

  • ConceptClient-Side Agent Optimization

    AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server…

  • ConceptLLM-as-Compiler Knowledge Base

    Karpathy's architecture: LLM incrementally compiles raw docs into a persistent interlinked wiki, replacing RAG with a 4…

  • EntitySymphony

    OpenAI's open-source agent orchestrator (March 2026): turns Linear into a control plane for Codex, per-issue workspace,…

  • ConceptTicket-Driven Agent Orchestration

    The inversion that makes Symphony work: tickets as units of work (not sessions/PRs), DAG dependencies, agent-extensible…

Related articles
  • ConceptAgent Harness Engineering

    Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…

  • ConceptClaude Code Best Practices

    Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…

  • ConceptLLM-Driven Vulnerability Research

    Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and An…

  • EntityClaude Opus 4.7

    GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokeniz…

  • ConceptClient-Side Agent Optimization

    AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server…