Skip to content

Choosing a model

Not every open-weight model works well as a Hermes backend. Here is how I pick with Qwen 3.5 as my default family.

Non-negotiable: tool calling

Hermes is an agent. It needs a model that supports function/tool calling via the OpenAI-compatible API. Without it, Hermes can only chat; it cannot edit files, run commands, or delegate.

Qwen 3.5 ships with native tool calling and is officially supported with Hermes via Ollama.

Other models that work for agentic use:

  • Qwen 3.5 (0.8B–122B, multimodal, 256K context): my recommended default
  • Qwen 2.5 / Qwen 2.5 Coder (7B–32B): lighter fallback if 3.5 is too heavy

Test tool calling after setup:

text
Create a file /tmp/tool-test.txt with the word "success".
Then read it back.

If Hermes only talks about creating the file but never does, the model likely lacks tool support.

Qwen 3.5 variants (Ollama)

From the official Ollama library:

TagSizeRAM neededBest for
qwen3.5:9b9B (default)~8 GBFast, lighter laptops
qwen3.5:27b27B~17 GBMy daily driver
qwen3.5:35b35B MoE~24 GBBest quality, more RAM
qwen3.5:4b4B~4 GBQuick tasks only
bash
ollama pull qwen3.5:27b     # recommended for agentic work
ollama pull qwen3.5:9b      # lighter option
ollama pull qwen3.5:35b     # if you have 32 GB+ RAM

Quick launch with Hermes (official Ollama integration):

bash
ollama launch hermes --model qwen3.5:27b

Size vs. hardware

Your RAM/VRAMRecommended modelNotes
8–16 GBqwen3.5:9bGood starting point
24 GBqwen3.5:27bMy sweet spot on laptop
32 GB+qwen3.5:35bBest agentic quality
64 GB+qwen3.5:122bNear-frontier (very heavy)

If Qwen 3.5 is too slow or heavy, fall back to qwen3.5:4b or qwen2.5:7b for quick tasks.

Coding vs. general

Use caseModel bias
File editing, shell, dev tasksqwen3.5:27b
Writing, research, general assistantqwen3.5:27b or qwen3.5:9b
Mixed daily driverqwen3.5-64k (what I use)

How to switch models

bash
hermes model
# or edit config.yaml and restart

Hot-swap inside a session:

text
/model qwen3.5:27b

My decision process

  1. Start with qwen3.5:27b and a 64k Modelfile variant.
  2. Run the file assistant use case as a benchmark.
  3. If too slow, use qwen3.5:9b for routine tasks and keep 27b for hard ones.
  4. If quality is lacking, try qwen3.5:35b or add a cloud fallback.

Next: Context length & performance.

Personal learning notes on Hermes Agent. Not affiliated with Nous Research. Verify against official docs.