API Reference

OMM exposes three compatible API protocols on a single server. Drop in any existing tool or SDK that uses Ollama, OpenAI, or Anthropic APIs without code changes.

Starting the Server

Terminal
bash
# Start on default port 11434
omm serve
# Custom host and port
omm serve --host 0.0.0.0 --port 8080
# With authentication
omm serve --api-key sk-your-secret-key

Ollama-Compatible API

Full compatibility with the Ollama REST API. Use any Ollama client library or tool with OMM as the backend.

Generate

POST/api/generate
Request
json
{
"model": "llama3.2",
"prompt": "Explain quantum computing in one paragraph",
"stream": true,
"options": {
"temperature": 0.7,
"top_p": 0.9,
"num_predict": 256
}
}

Chat

POST/api/chat
Request
json
{
"model": "llama3.2",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"stream": true
}

List Models

GET/api/tags
Response
json
{
"models": [
{
"name": "llama3.2:latest",
"size": 2019393189,
"modified_at": "2026-04-28T10:00:00Z"
}
]
}

Pull Model

POST/api/pull
Request
json
{
"name": "qwen2.5:7b",
"stream": true
}

OpenAI-Compatible API

Full compatibility with the OpenAI Chat Completions API. Use the OpenAI Python or Node SDK with a base URL change.

Chat Completions

POST/v1/chat/completions
Python SDK
python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="omm" # any non-empty string works without auth
)
response = client.chat.completions.create(
model="llama3.2",
messages=[
{"role": "user", "content": "Write a haiku about coding"}
],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")

Models Endpoint

GET/v1/models
Response
json
{
"data": [
{
"id": "llama3.2",
"object": "model",
"owned_by": "local"
}
]
}

Anthropic-Compatible API

Compatibility with the Anthropic Messages API. Use the Anthropic Python or Node SDK with a base URL change.

Messages

POST/anthropic/v1/messages
Python SDK
python
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:11434/anthropic",
api_key="omm"
)
message = client.messages.create(
model="llama3.2",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain recursion in simple terms"}
]
)
print(message.content[0].text)

Authentication

When started with --api-key, OMM requires an Authorization header on all requests:

Terminal
bash
# Ollama API
curl -H "Authorization: Bearer sk-your-secret-key" \
http://localhost:11434/api/tags
# OpenAI API
curl -H "Authorization: Bearer sk-your-secret-key" \
http://localhost:11434/v1/models
# Anthropic API
curl -H "x-api-key: sk-your-secret-key" \
http://localhost:11434/anthropic/v1/messages

Streaming

All three APIs support Server-Sent Events (SSE) streaming when"stream": true is set in the request. Each chunk contains a partial response. The stream ends with a[DONE] message (OpenAI format) or a final JSON object with "done": true(Ollama format).

Model Parameters

Pass generation parameters per-request via the options or parameters object:

Temperaturetemperaturetemperature0.7Top Ptop_ptop_p0.9Top Ktop_k—40Max tokensnum_predictmax_tokens4096Repeat penaltyrepeat_penaltyfrequency_penalty1.1Stop sequencesstopstop[]Seedseedseed(random)
PreviousGPU SetupNextConfiguration