Appearance
Limitations & troubleshooting
Honest limits of local Hermes, plus fixes for the problems I hit during setup.
Known limitations
Local model quality
27B open-weight models (like Qwen 3.5) are capable but not frontier-class. They miss nuance, hallucinate facts, and struggle with complex multi-file refactors. You stay the reviewer.
Speed
2-5 tok/s on CPU, 15-40 tok/s on Apple Silicon. Multi-step agent tasks take minutes, not seconds.
Context ceiling
Even at 64k, long agent sessions with many tool calls fill context. Start fresh sessions for new topics.
Tool calling required
Models without tool support turn Hermes into a chatbot. Always verify with a file creation test.
Troubleshooting guide
Hermes only chats, doesn't act
Cause: Model lacks tool calling. Fix: Switch to qwen3.5:27b, qwen3.5:9b, or another tool-capable model.
Agent forgets mid-task or loops
Cause: Ollama context too small (default 2048). Fix: Set OLLAMA_CONTEXT_LENGTH=65536 and context_length: 65536 in Hermes config.
"Connection refused" to Ollama
Cause: Ollama not running. Fix: ollama serve in a terminal, or start Ollama.app on macOS.
Timeouts on local model
Cause: Model too slow, usually on a very large context. Fix: Hermes already relaxes timeouts for local endpoints. If it still trips, set export HERMES_STREAM_READ_TIMEOUT=1800 (or a per-provider request_timeout_seconds in config.yaml).
Context length exceeded mid-session
Cause: Long agent session filled the window. Fix: Run /compress to summarize history in place, or /new to start fresh. Check /usage to see how close you are.
Build error: theme/index not found
Cause: Empty .vitepress/theme/ directory without index.ts. Fix: Add theme entry file or remove empty theme folder.
Web tool failures offline
Cause: Web toolsets enabled without internet. Fix: Disable in config (see Toolsets).
Telegram bot not responding
Cause: Gateway not running, or laptop asleep. Fix: hermes gateway start, prevent sleep, or move to always-on server.
When to escalate
| Situation | Action |
|---|---|
| Routine file/docs work | Local model is fine |
| Complex reasoning | Cloud fallback or larger local model |
| Production code you ship | Human review required regardless |
| Sensitive data | Local only, no fallback |
Official resources
That completes the handbook. Start with Complete offline setup if you haven't already, then pick a use case and run it.