Appearance
Providers & cloud fallback
My default is fully local (Ollama + Qwen 3.5). But Hermes is provider-agnostic, and there is one local-friendly pattern worth knowing: keep local as the primary model and add a cloud provider only as a fallback for when the local box is busy, down, or out of its depth.
How Hermes picks a model
Hermes resolves a provider and model from config.yaml plus credentials in .env. For a local setup that means a custom OpenAI-compatible endpoint:
yaml
# ~/.hermes/config.yaml
model:
provider: custom
default: qwen3.5-64k
base_url: http://localhost:11434/v1
context_length: 65536You can switch models mid-session with /model, and route side tasks (vision, compression, titles) to different models under auxiliary.*.
Three layers of resilience
Hermes has three independent layers that keep a session alive when a provider has trouble:
- Credential pools: rotate across multiple API keys for the same provider (tried first).
- Primary model fallback: switch to a different provider:model when the main one fails.
- Auxiliary task fallback: side tasks resolve their own provider independently.
For a local-first setup, layer 2 is the interesting one.
Cloud fallback for a local primary
When the local model errors (server overloaded, Ollama not running, repeated malformed responses), Hermes can fail over to a backup, mid-session, without losing your conversation.
Configure interactively:
bash
hermes fallbackOr edit config.yaml directly with a top-level fallback_providers list:
yaml
model:
provider: custom
default: qwen3.5-64k
base_url: http://localhost:11434/v1
fallback_providers:
- provider: openrouter
model: anthropic/claude-sonnet-4Each entry needs both provider and model. Fallback triggers on rate limits (429), server errors (500/502/503), auth failures (401/403, immediately), 404, or repeated invalid responses.
Per-turn, not per-session
Fallback is turn-scoped. Each new message starts on the primary again, so a transient local hiccup does not permanently push you onto the cloud model. Within one turn, fallback activates at most once.
The reverse pattern: local as the fallback
You can also keep a cloud model primary and fail over to local when you hit a rate limit or want to stop spending:
yaml
fallback_providers:
- provider: custom
model: qwen3.5-64k
base_url: http://localhost:11434/v1
key_env: LOCAL_API_KEY # any non-empty value works for OllamaWhere fallback works
Fallback is honored in CLI sessions, the messaging gateway, subagent delegation (children inherit the chain), and cron jobs.
Keep it offline-pure if you want to
There are no environment variables for the primary fallback chain, it is configured only in config.yaml or via hermes fallback. If your goal is a strictly air-gapped agent, simply leave fallback_providers empty and Hermes never reaches for the cloud.
Other providers
If you do want cloud access, Hermes speaks to Nous Portal, OpenRouter, OpenAI, Anthropic, Google, xAI, DeepSeek, Bedrock, Azure Foundry, LM Studio, and many more, plus any OpenAI-compatible endpoint via provider: custom. The full list and credential requirements live in the official Providers and Fallback Providers docs.