Limitations & troubleshooting

Honest limits of local Hermes, plus fixes for the problems I hit during setup.

Known limitations

Local model quality

27B open-weight models (like Qwen 3.5) are capable but not frontier-class. They miss nuance, hallucinate facts, and struggle with complex multi-file refactors. You stay the reviewer.

Speed

2-5 tok/s on CPU, 15-40 tok/s on Apple Silicon. Multi-step agent tasks take minutes, not seconds.

Context ceiling

Even at 64k, long agent sessions with many tool calls fill context. Start fresh sessions for new topics.

Tool calling required

Models without tool support turn Hermes into a chatbot. Always verify with a file creation test.

Troubleshooting guide

Hermes only chats, doesn't act

Cause: Model lacks tool calling. Fix: Switch to qwen3.5:27b, qwen3.5:9b, or another tool-capable model.

Agent forgets mid-task or loops

Cause: Ollama context too small (default 2048). Fix: Set OLLAMA_CONTEXT_LENGTH=65536 and context_length: 65536 in Hermes config.

"Connection refused" to Ollama

Cause: Ollama not running. Fix: ollama serve in a terminal, or start Ollama.app on macOS.

Timeouts on local model

Cause: Model too slow, usually on a very large context. Fix: Hermes already relaxes timeouts for local endpoints. If it still trips, set export HERMES_STREAM_READ_TIMEOUT=1800 (or a per-provider request_timeout_seconds in config.yaml).

Context length exceeded mid-session

Cause: Long agent session filled the window. Fix: Run /compress to summarize history in place, or /new to start fresh. Check /usage to see how close you are.

Build error: theme/index not found

Cause: Empty .vitepress/theme/ directory without index.ts. Fix: Add theme entry file or remove empty theme folder.

Web tool failures offline

Cause: Web toolsets enabled without internet. Fix: Disable in config (see Toolsets).

Telegram bot not responding

Cause: Gateway not running, or laptop asleep. Fix: hermes gateway start, prevent sleep, or move to always-on server.

When to escalate

Situation	Action
Routine file/docs work	Local model is fine
Complex reasoning	Cloud fallback or larger local model
Production code you ship	Human review required regardless
Sensitive data	Local only, no fallback

Official resources

That completes the handbook. Start with Complete offline setup if you haven't already, then pick a use case and run it.

Limitations & troubleshooting ​

Known limitations ​

Local model quality ​

Speed ​

Context ceiling ​

Tool calling required ​

Troubleshooting guide ​

Hermes only chats, doesn't act ​

Agent forgets mid-task or loops ​

"Connection refused" to Ollama ​

Timeouts on local model ​

Context length exceeded mid-session ​

Build error: theme/index not found ​

Web tool failures offline ​

Telegram bot not responding ​

When to escalate ​

Official resources ​