API server & Python library

Two ways to use Hermes as a backend rather than a chat partner: expose it as an OpenAI-compatible HTTP API (so any chat frontend can talk to it), or import it directly into your own Python code. Both run locally and pair naturally with a local model.

API server (OpenAI-compatible)

The API server turns Hermes into a drop-in OpenAI endpoint. Any frontend that speaks the OpenAI format, Open WebUI, LobeChat, LibreChat, NextChat, and many more, can use your local, tool-equipped agent as its backend.

Enable it

bash

# ~/.hermes/.env
API_SERVER_ENABLED=true
API_SERVER_KEY=change-me-local-dev

bash

hermes gateway
# → [API Server] API server listening on http://127.0.0.1:8642

Use it

bash

curl http://localhost:8642/v1/chat/completions \
  -H "Authorization: Bearer change-me-local-dev" \
  -H "Content-Type: application/json" \
  -d '{"model": "hermes-agent", "messages": [{"role": "user", "content": "Hello!"}]}'

It serves the full toolset (terminal, files, web, memory, skills), so a frontend like Open WebUI gets a real agent, not just a chat model. Streaming responses include tool-progress events so the UI can show what the agent is doing.

Endpoints worth knowing

POST /v1/chat/completions: standard OpenAI chat, stateless.
POST /v1/responses: OpenAI Responses API with server-side conversation state via previous_response_id or a named conversation.
GET /v1/models: advertises the agent (model name defaults to the profile name).
POST /v1/runs + /events + /stop: a runs API for dashboards that want to subscribe to progress and cancel mid-run.
GET /health: health check.

Security

WARNING

The API server exposes the full toolset, including terminal commands. API_SERVER_KEY is required for every deployment, even the default 127.0.0.1 bind. It does not enable browser CORS by default; if a browser must call it directly, set API_SERVER_CORS_ORIGINS to an explicit allowlist.

By default it binds to localhost only. Keep it that way unless you have a deliberate reason to expose it, and put a per-user model behind a profile if you want isolation:

bash

# multi-user via profiles, each on its own port
cat >> ~/.hermes/profiles/alice/.env <<EOF
API_SERVER_ENABLED=true
API_SERVER_PORT=8643
API_SERVER_KEY=alice-secret
EOF
hermes -p alice gateway

Python library

You can import AIAgent and embed Hermes in scripts, web apps, or CI pipelines, no CLI required.

python

from run_agent import AIAgent

agent = AIAgent(
    model="qwen3.5-64k",
    base_url="http://localhost:11434/v1",
    quiet_mode=True,
)
response = agent.chat("What is the capital of France?")
print(response)

WARNING

Always set quiet_mode=True when embedding, otherwise the agent prints CLI spinners and progress into your output.

Full conversation control

python

result = agent.run_conversation(
    user_message="Search my notes for the deployment runbook",
    conversation_history=previous_result["messages"],   # multi-turn
)
print(result["final_response"])

Useful constructor parameters

Parameter	Purpose
`model`, `base_url`, `api_key`	Point at your local endpoint
`quiet_mode=True`	Suppress CLI output (use when embedding)
`enabled_toolsets` / `disabled_toolsets`	Lock the agent down (e.g. `enabled_toolsets=["web"]`)
`skip_context_files=True`	Don't load AGENTS.md from cwd
`skip_memory=True`	Stateless, no memory read/write (good for API endpoints)
`max_iterations`	Cap tool-calling loops (default 90; lower for simple Q&A)

Thread safety

Create a new AIAgent per thread or task, never share one instance across concurrent calls. For parallel batch work, spin up a fresh agent inside each worker:

python

import concurrent.futures
from run_agent import AIAgent

def process(prompt):
    agent = AIAgent(model="qwen3.5-64k", base_url="http://localhost:11434/v1",
                    quiet_mode=True, skip_memory=True)
    return agent.chat(prompt)

with concurrent.futures.ThreadPoolExecutor(max_workers=2) as ex:
    results = list(ex.map(process, prompts))

(On a single local GPU, keep the worker count low, parallel agents contend for the same hardware.)

Editors via ACP

Hermes also speaks ACP, so you can use it inside ACP-compatible editors (VS Code, Zed, JetBrains) by running hermes acp. See the official ACP docs.

API server & Python library ​

API server (OpenAI-compatible) ​

Enable it ​

Use it ​

Endpoints worth knowing ​

Security ​

Python library ​

Full conversation control ​

Useful constructor parameters ​

Thread safety ​

API server & Python library

API server (OpenAI-compatible)

Enable it

Use it

Endpoints worth knowing

Security

Python library

Full conversation control

Useful constructor parameters

Thread safety