Appearance
Web, vision & images
Beyond text, Hermes can look at images, browse and extract web pages, and generate images. Most of these reach out to the internet or a paid API, so this page flags clearly which parts stay local and which do not, that distinction matters for an offline-first setup.
Vision (image understanding)
This is the most offline-friendly multimodal feature, because it depends on your model, not a cloud service.
- Paste a screenshot into the CLI (
/paste, or Ctrl/Cmd+V where supported) and ask about it. Images are saved to~/.hermes/images/. - If your local model is vision-capable (Qwen-VL, MiMo-VL, and similar served through Ollama), the image is sent as real pixels, fully local.
- If your model is text-only, Hermes routes the image through the
vision_analyzeauxiliary tool: an auxiliary vision model describes it and injects the text. To keep this offline, pointauxiliary.visionat a local vision model.
yaml
# ~/.hermes/config.yaml
auxiliary:
vision:
provider: custom
model: qwen2.5-vl
base_url: http://localhost:11434/v1Clipboard paste over SSH
Image paste reads the clipboard of the machine running Hermes. Over SSH that is the remote host, not your laptop, so paste won't see your local clipboard. Workarounds: upload the file and reference it by path, use an image URL, or send it via a messaging platform.
Web search & extraction
web_search and web_extract let the agent look things up and read pages. These require internet access and a search/extract backend (Nous Portal Tool Gateway, or provider keys), so they are not part of a strictly air-gapped setup.
If you want a purely offline agent, disable the web toolset entirely:
yaml
# ~/.hermes/config.yaml
agent:
disabled_toolsets:
- web # no web_search / web_extract
- browser # no browser automation
- image_gen # no image generationThe agent then relies only on local files, terminal, memory, and skills, which is the right posture when no data should ever leave the machine.
Browser automation
The browser toolset can navigate sites, fill forms, and screenshot pages. Backends include cloud providers (Browserbase, Browser Use, Firecrawl) and local options (Camofox, or attaching to your own Chrome/Chromium via CDP). Even with a local browser backend, the sites it visits are on the internet, so treat browsing as an online capability.
A useful detail: when a cloud browser provider is configured, Hermes auto-spawns a local Chromium sidecar for localhost/LAN URLs, so it can screenshot your own dev server at http://localhost:3000 without sending that private URL to the cloud.
Image generation
image_generate creates images from text prompts via FAL.ai (eleven models, configurable in hermes tools). This is a paid, cloud capability, it needs a FAL_KEY or a Nous Portal subscription, and there is no local image-generation backend in the default setup. If you don't use it, leave image_gen in disabled_toolsets as shown above.
SSRF protection (relevant for local services)
All URL-capable tools validate URLs before fetching, and by default block private networks, loopback, link-local, and cloud-metadata addresses. If you legitimately want the agent to reach a LAN-only service (a local wiki, an internal Ollama endpoint), opt in:
yaml
# ~/.hermes/config.yaml
security:
allow_private_urls: true # default: falseOnly enable this on machines where you trust the prompts, it is a deliberate trust boundary. See the security model for the full picture.