Skip to content

Delegation & subagents

When a task is big enough to flood the main conversation, or when several independent pieces of work could happen at once, Hermes can spawn subagents with the delegate_task tool. Each child gets a fresh, isolated context and its own terminal session. Only the child's final summary comes back to the parent, which keeps the main context lean.

Single task

python
delegate_task(
    goal="Debug why tests fail",
    context="Error: assertion in test_foo.py line 42",
    toolsets=["terminal", "file"],
)

Parallel batch

Up to 3 children run concurrently by default:

python
delegate_task(tasks=[
    {"goal": "Research topic A", "toolsets": ["web"]},
    {"goal": "Research topic B", "toolsets": ["web"]},
    {"goal": "Fix the build", "toolsets": ["terminal", "file"]},
])

The one rule that matters: subagents know nothing

WARNING

A subagent starts with a completely fresh conversation. It has zero knowledge of the parent's history or prior tool calls. Everything it needs must be in the goal and context fields.

python
# BAD - the child has no idea what "the error" is
delegate_task(goal="Fix the error")

# GOOD - all context is passed in
delegate_task(
    goal="Fix the TypeError in api/handlers.py",
    context="""api/handlers.py raises a TypeError on line 47:
    'NoneType' object has no attribute 'get'. parse_body() returns None
    when Content-Type is missing. Project is at /home/user/myproject, Python 3.11.""",
    toolsets=["terminal", "file"],
)

Picking toolsets for children

ToolsetsUse case
["terminal", "file"]Code work, debugging, builds
["web"]Research, fact-checking
["file"]Read-only analysis / code review
["terminal"]System administration

Leaf subagents cannot call delegate_task, clarify, memory, send_message, or execute_code, this keeps them focused and prevents runaway recursion.

Local-model considerations

Delegation multiplies inference work: three parallel children means three concurrent generations. On a single local GPU that is slower, not faster, since they contend for the same hardware. Two patterns help:

  • Route children to a smaller/cheaper model so they don't fight the main model for resources:
yaml
# ~/.hermes/config.yaml
delegation:
  model: qwen3.5-coder
  base_url: http://localhost:11434/v1
  api_key: local-key
  max_concurrent_children: 2
  • Or keep delegation sequential by lowering max_concurrent_children to 1 when you are GPU-bound.

delegate_task is synchronous, not durable

WARNING

delegate_task runs inside the parent's current turn and blocks until children finish. If the parent is interrupted (you send a new message, /stop), all children are cancelled and their work is discarded. Children do not keep running after the turn ends.

For durable, long-running work that must survive interrupts, use a cron job (cronjob create) or a backgrounded terminal command (terminal(background=True, notify_on_complete=True)) instead.

Delegation vs execute_code

Factordelegate_taskexecute_code
ReasoningFull LLM loopJust Python execution
Best forTasks needing judgmentMechanical multi-step pipelines
Token costHigherLower (only stdout returns)

Use delegate_task when the subtask needs reasoning; use execute_code for scripted data processing.

Nested orchestration (advanced)

By default delegation is flat: a parent spawns children that cannot delegate further. For multi-stage workflows, a child can be spawned with role="orchestrator" and delegation.max_spawn_depth raised above 1. Be careful: with depth 3 and 3 children per level, the tree can reach 27 concurrent agents, which is far too much for a local box.

TIP

You don't usually invoke delegation yourself, the agent decides when a task benefits from it. See the official Delegation Patterns guide for hands-on examples.

Personal learning notes on Hermes Agent. Not affiliated with Nous Research. Verify against official docs.