Models

OMM (Open Model Manager) is a desktop application for running large language models locally. It downloads models from HuggingFace and the Ollama registry, runs them through a custom inference engine with GPU acceleration, and exposes them via Ollama, OpenAI, and Anthropic compatible APIs.

Everything runs on your machine. No data leaves your device.

Key Features

  • One-command install — Pull any GGUF or SafeTensors model from HuggingFace
  • Dual inference engine — llama.cpp for broad hardware support, rvllm for multi-GPU and batched inference
  • 3 API protocols — Compatible with Ollama, OpenAI, and Anthropic clients
  • GPU acceleration — CUDA, Metal, Vulkan, and ROCm backends with auto-detection
  • 50,000+ models — Browse and install from HuggingFace and Ollama registries
  • Desktop app — Chat interface, model library, system monitor, floating window mode

Quick Start

Terminal
bash
# Install and run a model
omm pull llama3.2
omm run llama3.2
# Pull from HuggingFace
omm pull --hf bartowski/Qwen2.5-7B-GGUF -n qwen7b
# Start the API server
omm serve

API Compatibility

Once running, OMM accepts requests on three API protocols:

ProtocolEndpointUse With
Ollama/api/*Ollama CLI, Open WebUI
OpenAI/v1/chat/completionsCode CLI, Cursor, Continue.dev
Anthropic/v1/messagesClaude Desktop, direct SDK

Tool Integrations

Configure your favorite tools to use OMM as a backend:

Terminal
bash
# Configure Claude Code
omm launch claude --model llama3.2
# Configure Cursor
omm launch cursor --model qwen3.5
# Configure Continue.dev
omm launch continue --model gemma3
PreviousVoice InputNextInstallation