Providers & cloud fallback

My default is fully local (Ollama + Qwen 3.5). But Hermes is provider-agnostic, and there is one local-friendly pattern worth knowing: keep local as the primary model and add a cloud provider only as a fallback for when the local box is busy, down, or out of its depth.

How Hermes picks a model

Hermes resolves a provider and model from config.yaml plus credentials in .env. For a local setup that means a custom OpenAI-compatible endpoint:

yaml

# ~/.hermes/config.yaml
model:
  provider: custom
  default: qwen3.5-64k
  base_url: http://localhost:11434/v1
  context_length: 65536

You can switch models mid-session with /model, and route side tasks (vision, compression, titles) to different models under auxiliary.*.

Three layers of resilience

Hermes has three independent layers that keep a session alive when a provider has trouble:

Credential pools: rotate across multiple API keys for the same provider (tried first).
Primary model fallback: switch to a different provider:model when the main one fails.
Auxiliary task fallback: side tasks resolve their own provider independently.

For a local-first setup, layer 2 is the interesting one.

Cloud fallback for a local primary

When the local model errors (server overloaded, Ollama not running, repeated malformed responses), Hermes can fail over to a backup, mid-session, without losing your conversation.

Configure interactively:

bash

hermes fallback

Or edit config.yaml directly with a top-level fallback_providers list:

yaml

model:
  provider: custom
  default: qwen3.5-64k
  base_url: http://localhost:11434/v1

fallback_providers:
  - provider: openrouter
    model: anthropic/claude-sonnet-4

Each entry needs both provider and model. Fallback triggers on rate limits (429), server errors (500/502/503), auth failures (401/403, immediately), 404, or repeated invalid responses.

Per-turn, not per-session

Fallback is turn-scoped. Each new message starts on the primary again, so a transient local hiccup does not permanently push you onto the cloud model. Within one turn, fallback activates at most once.

The reverse pattern: local as the fallback

You can also keep a cloud model primary and fail over to local when you hit a rate limit or want to stop spending:

yaml

fallback_providers:
  - provider: custom
    model: qwen3.5-64k
    base_url: http://localhost:11434/v1
    key_env: LOCAL_API_KEY   # any non-empty value works for Ollama

Where fallback works

Fallback is honored in CLI sessions, the messaging gateway, subagent delegation (children inherit the chain), and cron jobs.

Keep it offline-pure if you want to

There are no environment variables for the primary fallback chain, it is configured only in config.yaml or via hermes fallback. If your goal is a strictly air-gapped agent, simply leave fallback_providers empty and Hermes never reaches for the cloud.

Other providers

If you do want cloud access, Hermes speaks to Nous Portal, OpenRouter, OpenAI, Anthropic, Google, xAI, DeepSeek, Bedrock, Azure Foundry, LM Studio, and many more, plus any OpenAI-compatible endpoint via provider: custom. The full list and credential requirements live in the official Providers and Fallback Providers docs.

Providers & cloud fallback ​

How Hermes picks a model ​

Three layers of resilience ​

Cloud fallback for a local primary ​

The reverse pattern: local as the fallback ​

Where fallback works ​

Other providers ​

Providers & cloud fallback

How Hermes picks a model

Three layers of resilience

Cloud fallback for a local primary

The reverse pattern: local as the fallback

Where fallback works

Other providers