Models

OMM (Open Model Manager) is a desktop application for running large language models locally. It downloads models from HuggingFace and the Ollama registry, runs them through a custom inference engine with GPU acceleration, and exposes them via Ollama, OpenAI, and Anthropic compatible APIs.

Everything runs on your machine. No data leaves your device.

Key Features

One-command install — Pull any GGUF or SafeTensors model from HuggingFace
Dual inference engine — llama.cpp for broad hardware support, rvllm for multi-GPU and batched inference
3 API protocols — Compatible with Ollama, OpenAI, and Anthropic clients
GPU acceleration — CUDA, Metal, Vulkan, and ROCm backends with auto-detection
50,000+ models — Browse and install from HuggingFace and Ollama registries
Desktop app — Chat interface, model library, system monitor, floating window mode

Quick Start

Terminal

bash

# Install and run a model
omm pull llama3.2
omm run llama3.2

# Pull from HuggingFace
omm pull --hf bartowski/Qwen2.5-7B-GGUF -n qwen7b

# Start the API server
omm serve

API Compatibility

Once running, OMM accepts requests on three API protocols:

Protocol	Endpoint	Use With
Ollama	/api/*	Ollama CLI, Open WebUI
OpenAI	/v1/chat/completions	Code CLI, Cursor, Continue.dev
Anthropic	/v1/messages	Claude Desktop, direct SDK

Tool Integrations

Configure your favorite tools to use OMM as a backend:

Terminal

bash

# Configure Claude Code
omm launch claude --model llama3.2

# Configure Cursor
omm launch cursor --model qwen3.5

# Configure Continue.dev
omm launch continue --model gemma3

Installation→

Install OMM on macOS, Linux, and Windows.

Model Catalog→

Browse and discover models to run locally.

Inference Engines→

llama.cpp vs rvllm and when to use each.

Configuration→

Full configuration reference for config.toml.

#Models

#Key Features

#Quick Start

#API Compatibility

#Tool Integrations

Models

Key Features

Quick Start

API Compatibility

Tool Integrations