Quick Start

Get a model running in under 60 seconds. This guide covers the fastest path from installation to your first chat.

Prerequisites

OMM installed (see Installation)
At least 4 GB of free RAM (8+ GB recommended)
Optional: CUDA-capable GPU for acceleration

1. Pull Your First Model

Terminal

bash

$ omm pull llama3.2
pulling manifest...
pulling 1da732c8... ▓▓▓▓▓▓▓▓▓▓ 100%
pulling a53cc7e0... ▓▓▓▓▓▓▓▓▓▓ 100%
verifying sha256 digest...
writing manifest...
success

This downloads the Llama 3.2 3B model (Q4_K_M quantization, ~2 GB). For a more capable model, use omm pull llama3.3:70b (requires ~40 GB RAM).

2. Start Chatting

Desktop App

Launch OMM and select the model from the sidebar. The chat interface opens immediately. Type your message and press Enter.

CLI

Terminal

bash

$ omm run llama3.2
>>> Hello! Can you help me write a Python function?
Of course! What should the function do?

>>>

3. Start the API Server

To use OMM as a drop-in replacement for other AI APIs:

Terminal

bash

$ omm serve
OMM server listening on http://127.0.0.1:11434

The server exposes three compatible API protocols. You can immediately use it with any tool that supports Ollama, OpenAI, or Anthropic SDKs:

Terminal

bash

# Ollama-compatible
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Hello, world!"
}'

# OpenAI-compatible
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'