Quick Start
Get a model running in under 60 seconds. This guide covers the fastest path from installation to your first chat.
Prerequisites
- OMM installed (see Installation)
- At least 4 GB of free RAM (8+ GB recommended)
- Optional: CUDA-capable GPU for acceleration
1. Pull Your First Model
This downloads the Llama 3.2 3B model (Q4_K_M quantization, ~2 GB). For a more capable model, use omm pull llama3.3:70b (requires ~40 GB RAM).
2. Start Chatting
Desktop App
Launch OMM and select the model from the sidebar. The chat interface opens immediately. Type your message and press Enter.
CLI
3. Start the API Server
To use OMM as a drop-in replacement for other AI APIs:
The server exposes three compatible API protocols. You can immediately use it with any tool that supports Ollama, OpenAI, or Anthropic SDKs: