Models
OMM (Open Model Manager) is a desktop application for running large language models locally. It downloads models from HuggingFace and the Ollama registry, runs them through a custom inference engine with GPU acceleration, and exposes them via Ollama, OpenAI, and Anthropic compatible APIs.
Everything runs on your machine. No data leaves your device.
Key Features
- One-command install — Pull any GGUF or SafeTensors model from HuggingFace
- Dual inference engine — llama.cpp for broad hardware support, rvllm for multi-GPU and batched inference
- 3 API protocols — Compatible with Ollama, OpenAI, and Anthropic clients
- GPU acceleration — CUDA, Metal, Vulkan, and ROCm backends with auto-detection
- 50,000+ models — Browse and install from HuggingFace and Ollama registries
- Desktop app — Chat interface, model library, system monitor, floating window mode
Quick Start
API Compatibility
Once running, OMM accepts requests on three API protocols:
| Protocol | Endpoint | Use With |
|---|---|---|
| Ollama | /api/* | Ollama CLI, Open WebUI |
| OpenAI | /v1/chat/completions | Code CLI, Cursor, Continue.dev |
| Anthropic | /v1/messages | Claude Desktop, direct SDK |
Tool Integrations
Configure your favorite tools to use OMM as a backend: