Skip to content

Web, vision & images

Beyond text, Hermes can look at images, browse and extract web pages, and generate images. Most of these reach out to the internet or a paid API, so this page flags clearly which parts stay local and which do not, that distinction matters for an offline-first setup.

Vision (image understanding)

This is the most offline-friendly multimodal feature, because it depends on your model, not a cloud service.

  • Paste a screenshot into the CLI (/paste, or Ctrl/Cmd+V where supported) and ask about it. Images are saved to ~/.hermes/images/.
  • If your local model is vision-capable (Qwen-VL, MiMo-VL, and similar served through Ollama), the image is sent as real pixels, fully local.
  • If your model is text-only, Hermes routes the image through the vision_analyze auxiliary tool: an auxiliary vision model describes it and injects the text. To keep this offline, point auxiliary.vision at a local vision model.
yaml
# ~/.hermes/config.yaml
auxiliary:
  vision:
    provider: custom
    model: qwen2.5-vl
    base_url: http://localhost:11434/v1

Clipboard paste over SSH

Image paste reads the clipboard of the machine running Hermes. Over SSH that is the remote host, not your laptop, so paste won't see your local clipboard. Workarounds: upload the file and reference it by path, use an image URL, or send it via a messaging platform.

Web search & extraction

web_search and web_extract let the agent look things up and read pages. These require internet access and a search/extract backend (Nous Portal Tool Gateway, or provider keys), so they are not part of a strictly air-gapped setup.

If you want a purely offline agent, disable the web toolset entirely:

yaml
# ~/.hermes/config.yaml
agent:
  disabled_toolsets:
    - web        # no web_search / web_extract
    - browser    # no browser automation
    - image_gen  # no image generation

The agent then relies only on local files, terminal, memory, and skills, which is the right posture when no data should ever leave the machine.

Browser automation

The browser toolset can navigate sites, fill forms, and screenshot pages. Backends include cloud providers (Browserbase, Browser Use, Firecrawl) and local options (Camofox, or attaching to your own Chrome/Chromium via CDP). Even with a local browser backend, the sites it visits are on the internet, so treat browsing as an online capability.

A useful detail: when a cloud browser provider is configured, Hermes auto-spawns a local Chromium sidecar for localhost/LAN URLs, so it can screenshot your own dev server at http://localhost:3000 without sending that private URL to the cloud.

Image generation

image_generate creates images from text prompts via FAL.ai (eleven models, configurable in hermes tools). This is a paid, cloud capability, it needs a FAL_KEY or a Nous Portal subscription, and there is no local image-generation backend in the default setup. If you don't use it, leave image_gen in disabled_toolsets as shown above.

SSRF protection (relevant for local services)

All URL-capable tools validate URLs before fetching, and by default block private networks, loopback, link-local, and cloud-metadata addresses. If you legitimately want the agent to reach a LAN-only service (a local wiki, an internal Ollama endpoint), opt in:

yaml
# ~/.hermes/config.yaml
security:
  allow_private_urls: true   # default: false

Only enable this on machines where you trust the prompts, it is a deliberate trust boundary. See the security model for the full picture.

Personal learning notes on Hermes Agent. Not affiliated with Nous Research. Verify against official docs.