Appearance
API server & Python library
Two ways to use Hermes as a backend rather than a chat partner: expose it as an OpenAI-compatible HTTP API (so any chat frontend can talk to it), or import it directly into your own Python code. Both run locally and pair naturally with a local model.
API server (OpenAI-compatible)
The API server turns Hermes into a drop-in OpenAI endpoint. Any frontend that speaks the OpenAI format, Open WebUI, LobeChat, LibreChat, NextChat, and many more, can use your local, tool-equipped agent as its backend.
Enable it
bash
# ~/.hermes/.env
API_SERVER_ENABLED=true
API_SERVER_KEY=change-me-local-devbash
hermes gateway
# → [API Server] API server listening on http://127.0.0.1:8642Use it
bash
curl http://localhost:8642/v1/chat/completions \
-H "Authorization: Bearer change-me-local-dev" \
-H "Content-Type: application/json" \
-d '{"model": "hermes-agent", "messages": [{"role": "user", "content": "Hello!"}]}'It serves the full toolset (terminal, files, web, memory, skills), so a frontend like Open WebUI gets a real agent, not just a chat model. Streaming responses include tool-progress events so the UI can show what the agent is doing.
Endpoints worth knowing
POST /v1/chat/completions: standard OpenAI chat, stateless.POST /v1/responses: OpenAI Responses API with server-side conversation state viaprevious_response_idor a namedconversation.GET /v1/models: advertises the agent (model name defaults to the profile name).POST /v1/runs+/events+/stop: a runs API for dashboards that want to subscribe to progress and cancel mid-run.GET /health: health check.
Security
WARNING
The API server exposes the full toolset, including terminal commands. API_SERVER_KEY is required for every deployment, even the default 127.0.0.1 bind. It does not enable browser CORS by default; if a browser must call it directly, set API_SERVER_CORS_ORIGINS to an explicit allowlist.
By default it binds to localhost only. Keep it that way unless you have a deliberate reason to expose it, and put a per-user model behind a profile if you want isolation:
bash
# multi-user via profiles, each on its own port
cat >> ~/.hermes/profiles/alice/.env <<EOF
API_SERVER_ENABLED=true
API_SERVER_PORT=8643
API_SERVER_KEY=alice-secret
EOF
hermes -p alice gatewayPython library
You can import AIAgent and embed Hermes in scripts, web apps, or CI pipelines, no CLI required.
python
from run_agent import AIAgent
agent = AIAgent(
model="qwen3.5-64k",
base_url="http://localhost:11434/v1",
quiet_mode=True,
)
response = agent.chat("What is the capital of France?")
print(response)WARNING
Always set quiet_mode=True when embedding, otherwise the agent prints CLI spinners and progress into your output.
Full conversation control
python
result = agent.run_conversation(
user_message="Search my notes for the deployment runbook",
conversation_history=previous_result["messages"], # multi-turn
)
print(result["final_response"])Useful constructor parameters
| Parameter | Purpose |
|---|---|
model, base_url, api_key | Point at your local endpoint |
quiet_mode=True | Suppress CLI output (use when embedding) |
enabled_toolsets / disabled_toolsets | Lock the agent down (e.g. enabled_toolsets=["web"]) |
skip_context_files=True | Don't load AGENTS.md from cwd |
skip_memory=True | Stateless, no memory read/write (good for API endpoints) |
max_iterations | Cap tool-calling loops (default 90; lower for simple Q&A) |
Thread safety
Create a new AIAgent per thread or task, never share one instance across concurrent calls. For parallel batch work, spin up a fresh agent inside each worker:
python
import concurrent.futures
from run_agent import AIAgent
def process(prompt):
agent = AIAgent(model="qwen3.5-64k", base_url="http://localhost:11434/v1",
quiet_mode=True, skip_memory=True)
return agent.chat(prompt)
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as ex:
results = list(ex.map(process, prompts))(On a single local GPU, keep the worker count low, parallel agents contend for the same hardware.)
Editors via ACP
Hermes also speaks ACP, so you can use it inside ACP-compatible editors (VS Code, Zed, JetBrains) by running hermes acp. See the official ACP docs.