Appearance
Delegation & subagents
When a task is big enough to flood the main conversation, or when several independent pieces of work could happen at once, Hermes can spawn subagents with the delegate_task tool. Each child gets a fresh, isolated context and its own terminal session. Only the child's final summary comes back to the parent, which keeps the main context lean.
Single task
python
delegate_task(
goal="Debug why tests fail",
context="Error: assertion in test_foo.py line 42",
toolsets=["terminal", "file"],
)Parallel batch
Up to 3 children run concurrently by default:
python
delegate_task(tasks=[
{"goal": "Research topic A", "toolsets": ["web"]},
{"goal": "Research topic B", "toolsets": ["web"]},
{"goal": "Fix the build", "toolsets": ["terminal", "file"]},
])The one rule that matters: subagents know nothing
WARNING
A subagent starts with a completely fresh conversation. It has zero knowledge of the parent's history or prior tool calls. Everything it needs must be in the goal and context fields.
python
# BAD - the child has no idea what "the error" is
delegate_task(goal="Fix the error")
# GOOD - all context is passed in
delegate_task(
goal="Fix the TypeError in api/handlers.py",
context="""api/handlers.py raises a TypeError on line 47:
'NoneType' object has no attribute 'get'. parse_body() returns None
when Content-Type is missing. Project is at /home/user/myproject, Python 3.11.""",
toolsets=["terminal", "file"],
)Picking toolsets for children
| Toolsets | Use case |
|---|---|
["terminal", "file"] | Code work, debugging, builds |
["web"] | Research, fact-checking |
["file"] | Read-only analysis / code review |
["terminal"] | System administration |
Leaf subagents cannot call delegate_task, clarify, memory, send_message, or execute_code, this keeps them focused and prevents runaway recursion.
Local-model considerations
Delegation multiplies inference work: three parallel children means three concurrent generations. On a single local GPU that is slower, not faster, since they contend for the same hardware. Two patterns help:
- Route children to a smaller/cheaper model so they don't fight the main model for resources:
yaml
# ~/.hermes/config.yaml
delegation:
model: qwen3.5-coder
base_url: http://localhost:11434/v1
api_key: local-key
max_concurrent_children: 2- Or keep delegation sequential by lowering
max_concurrent_childrento 1 when you are GPU-bound.
delegate_task is synchronous, not durable
WARNING
delegate_task runs inside the parent's current turn and blocks until children finish. If the parent is interrupted (you send a new message, /stop), all children are cancelled and their work is discarded. Children do not keep running after the turn ends.
For durable, long-running work that must survive interrupts, use a cron job (cronjob create) or a backgrounded terminal command (terminal(background=True, notify_on_complete=True)) instead.
Delegation vs execute_code
| Factor | delegate_task | execute_code |
|---|---|---|
| Reasoning | Full LLM loop | Just Python execution |
| Best for | Tasks needing judgment | Mechanical multi-step pipelines |
| Token cost | Higher | Lower (only stdout returns) |
Use delegate_task when the subtask needs reasoning; use execute_code for scripted data processing.
Nested orchestration (advanced)
By default delegation is flat: a parent spawns children that cannot delegate further. For multi-stage workflows, a child can be spawned with role="orchestrator" and delegation.max_spawn_depth raised above 1. Be careful: with depth 3 and 3 children per level, the tree can reach 27 concurrent agents, which is far too much for a local box.
TIP
You don't usually invoke delegation yourself, the agent decides when a task benefits from it. See the official Delegation Patterns guide for hands-on examples.