Choosing a model

Not every open-weight model works well as a Hermes backend. Here is how I pick with Qwen 3.5 as my default family.

Non-negotiable: tool calling

Hermes is an agent. It needs a model that supports function/tool calling via the OpenAI-compatible API. Without it, Hermes can only chat; it cannot edit files, run commands, or delegate.

Qwen 3.5 ships with native tool calling and is officially supported with Hermes via Ollama.

Other models that work for agentic use:

Qwen 3.5 (0.8B–122B, multimodal, 256K context): my recommended default
Qwen 2.5 / Qwen 2.5 Coder (7B–32B): lighter fallback if 3.5 is too heavy

Test tool calling after setup:

text

Create a file /tmp/tool-test.txt with the word "success".
Then read it back.

If Hermes only talks about creating the file but never does, the model likely lacks tool support.

Qwen 3.5 variants (Ollama)

From the official Ollama library:

Tag	Size	RAM needed	Best for
`qwen3.5:9b`	9B (default)	~8 GB	Fast, lighter laptops
`qwen3.5:27b`	27B	~17 GB	My daily driver
`qwen3.5:35b`	35B MoE	~24 GB	Best quality, more RAM
`qwen3.5:4b`	4B	~4 GB	Quick tasks only

bash

ollama pull qwen3.5:27b     # recommended for agentic work
ollama pull qwen3.5:9b      # lighter option
ollama pull qwen3.5:35b     # if you have 32 GB+ RAM

Quick launch with Hermes (official Ollama integration):

bash

ollama launch hermes --model qwen3.5:27b

Size vs. hardware

Your RAM/VRAM	Recommended model	Notes
8–16 GB	qwen3.5:9b	Good starting point
24 GB	qwen3.5:27b	My sweet spot on laptop
32 GB+	qwen3.5:35b	Best agentic quality
64 GB+	qwen3.5:122b	Near-frontier (very heavy)

If Qwen 3.5 is too slow or heavy, fall back to qwen3.5:4b or qwen2.5:7b for quick tasks.

Coding vs. general

Use case	Model bias
File editing, shell, dev tasks	`qwen3.5:27b`
Writing, research, general assistant	`qwen3.5:27b` or `qwen3.5:9b`
Mixed daily driver	`qwen3.5-64k` (what I use)

How to switch models

bash

hermes model
# or edit config.yaml and restart

Hot-swap inside a session:

text

/model qwen3.5:27b

My decision process

Start with qwen3.5:27b and a 64k Modelfile variant.
Run the file assistant use case as a benchmark.
If too slow, use qwen3.5:9b for routine tasks and keep 27b for hard ones.
If quality is lacking, try qwen3.5:35b or add a cloud fallback.

Next: Context length & performance.

Choosing a model ​

Non-negotiable: tool calling ​

Qwen 3.5 variants (Ollama) ​

Size vs. hardware ​

Coding vs. general ​

How to switch models ​

My decision process ​

Choosing a model

Non-negotiable: tool calling

Qwen 3.5 variants (Ollama)

Size vs. hardware

Coding vs. general

How to switch models

My decision process