Appearance
Choosing a model
Not every open-weight model works well as a Hermes backend. Here is how I pick with Qwen 3.5 as my default family.
Non-negotiable: tool calling
Hermes is an agent. It needs a model that supports function/tool calling via the OpenAI-compatible API. Without it, Hermes can only chat; it cannot edit files, run commands, or delegate.
Qwen 3.5 ships with native tool calling and is officially supported with Hermes via Ollama.
Other models that work for agentic use:
- Qwen 3.5 (0.8B–122B, multimodal, 256K context): my recommended default
- Qwen 2.5 / Qwen 2.5 Coder (7B–32B): lighter fallback if 3.5 is too heavy
Test tool calling after setup:
text
Create a file /tmp/tool-test.txt with the word "success".
Then read it back.If Hermes only talks about creating the file but never does, the model likely lacks tool support.
Qwen 3.5 variants (Ollama)
From the official Ollama library:
| Tag | Size | RAM needed | Best for |
|---|---|---|---|
qwen3.5:9b | 9B (default) | ~8 GB | Fast, lighter laptops |
qwen3.5:27b | 27B | ~17 GB | My daily driver |
qwen3.5:35b | 35B MoE | ~24 GB | Best quality, more RAM |
qwen3.5:4b | 4B | ~4 GB | Quick tasks only |
bash
ollama pull qwen3.5:27b # recommended for agentic work
ollama pull qwen3.5:9b # lighter option
ollama pull qwen3.5:35b # if you have 32 GB+ RAMQuick launch with Hermes (official Ollama integration):
bash
ollama launch hermes --model qwen3.5:27bSize vs. hardware
| Your RAM/VRAM | Recommended model | Notes |
|---|---|---|
| 8–16 GB | qwen3.5:9b | Good starting point |
| 24 GB | qwen3.5:27b | My sweet spot on laptop |
| 32 GB+ | qwen3.5:35b | Best agentic quality |
| 64 GB+ | qwen3.5:122b | Near-frontier (very heavy) |
If Qwen 3.5 is too slow or heavy, fall back to qwen3.5:4b or qwen2.5:7b for quick tasks.
Coding vs. general
| Use case | Model bias |
|---|---|
| File editing, shell, dev tasks | qwen3.5:27b |
| Writing, research, general assistant | qwen3.5:27b or qwen3.5:9b |
| Mixed daily driver | qwen3.5-64k (what I use) |
How to switch models
bash
hermes model
# or edit config.yaml and restartHot-swap inside a session:
text
/model qwen3.5:27bMy decision process
- Start with
qwen3.5:27band a 64k Modelfile variant. - Run the file assistant use case as a benchmark.
- If too slow, use
qwen3.5:9bfor routine tasks and keep 27b for hard ones. - If quality is lacking, try
qwen3.5:35bor add a cloud fallback.
Next: Context length & performance.