Quick Start

Get a model running in under 60 seconds. This guide covers the fastest path from installation to your first chat.

Prerequisites

  • OMM installed (see Installation)
  • At least 4 GB of free RAM (8+ GB recommended)
  • Optional: CUDA-capable GPU for acceleration

1. Pull Your First Model

Terminal
bash
$ omm pull llama3.2
pulling manifest...
pulling 1da732c8... ▓▓▓▓▓▓▓▓▓▓ 100%
pulling a53cc7e0... ▓▓▓▓▓▓▓▓▓▓ 100%
verifying sha256 digest...
writing manifest...
success

This downloads the Llama 3.2 3B model (Q4_K_M quantization, ~2 GB). For a more capable model, use omm pull llama3.3:70b (requires ~40 GB RAM).

2. Start Chatting

Desktop App

Launch OMM and select the model from the sidebar. The chat interface opens immediately. Type your message and press Enter.

CLI

Terminal
bash
$ omm run llama3.2
>>> Hello! Can you help me write a Python function?
Of course! What should the function do?
>>>

3. Start the API Server

To use OMM as a drop-in replacement for other AI APIs:

Terminal
bash
$ omm serve
OMM server listening on http://127.0.0.1:11434

The server exposes three compatible API protocols. You can immediately use it with any tool that supports Ollama, OpenAI, or Anthropic SDKs:

Terminal
bash
# Ollama-compatible
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Hello, world!"
}'
# OpenAI-compatible
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [{"role": "user", "content": "Hello!"}]
}'

4. Next Steps

PreviousInstallationNextFirst Request